Session Initiation Protocol (SIP) Basics

7 min readJul 8, 2021

Introduction

The Session Initiation Protocol (SIP) is a signaling protocol used for initiating, maintaining, and terminating real-time sessions that include voice, video, and messaging applications. SIP is used for signaling and controlling multimedia communication sessions in applications of Internet Telephony for voice and video calls, in private IP telephone systems, in instant messaging over Internet Protocol (IP) networks as well as mobile phone calling over LTE (VoLTE). You don’t need to understand any of these services before learning SIP. The article aims not to make you an expert of SIP applications, but to have a quick overview and offer beginners a helping hand to understand the concept of SIP instead of reading through vast RFC documents (Highly recommend doing so for deep understanding). So let’s jump into it.

Session Initiation Protocol is one of the most common protocols used in VoIP technology. It is a standard (RFC 3261) put forward by Internet Engineering Task Force (IETF). The SIP functionality is limited to setup, control, and tear down of sessions. The details of the data exchange within a session are not controlled by SIP.

SIP Functionality

SIP is an application-layer control protocol that can establish, modify, and terminate multimedia sessions. SIP can also invite participants to already existing sessions, such as multicast conferences. SIP transparently supports name mapping and redirection services, which supports personal mobility-users can maintain a single externally visible identifier regardless of their network location.

SIP supports five facets:

User location: Determination of the end system to be used for communication (translating from a user’s name to their current network address)

User availability: Determination of the willingness of the called party to engage in communications.

User capabilities: determination of the media and media parameters to be used

Session setup: “ringing”, the establishment of session parameters at both called and calling party

Session management: Including transfer and termination of sessions, modifying session parameters, and invoking services.

All of the other key functions are done with other protocols. That means SIP does not do conference control. SIP is not a resource reservation protocol and it has nothing to do with the quality of service (QoS). SIP can work in a framework with other protocols to make sure these roles are played out — but SIP does not do them. SIP can function with SOAP, HTTP, XML, VXML , RTP, RTSP, SDP and others.

SIP Components

In SIP, the entities interacting are called User Agents (UA). It is logically classified into two. (A single UA can function as both)

User Agent Clients (UAC)

UAC basically is the end-users, like the applications running on the systems used by people. It may be a softphone app running on a PC or a messaging device in your IP phone. It generates a request when you try to call another person over the network and sends the request to a server.

User Agent Servers(UAS)

UAS are entities that get requests, process those requests, and generate responses. Different types of servers are there:

1. Proxy Server

It is the network element that takes a request from a user agent and forwards it to another user (just like a router). When a request is generated without knowing the recipient's exact address in advance, the client sends the request to the proxy server. The server on behalf of the client forwards the request to another proxy server or the recipient itself.

There are two types of proxy servers:

Stateless Proxy Server − It simply forwards the message received. This type of server does not store any information about a call or a transaction.

Stateful Proxy Server − This type of proxy server keeps track of every request and response received and can use it in the future if required. It can retransmit the request if there is no response from the other side in time.

2. Registrar Server

The registrar server accepts registration requests from user agents. It helps users to authenticate themselves within the network. It stores the URI and the location of users in a database (location server) to help other SIP servers within the same domain.

3. Redirect Server

A redirect server redirects the request back to the client indicating that the client needs to try a different route to get to the recipient. It generally happens when a recipient has moved from its original position either temporarily or permanently.

4. Location Server

The addresses registered to a Registrar are stored in a Location Server. Only a proxy server or a redirect server can contact a location server.

SIP Methods

INVITE: Used to establish a media session between the user agents.

ACK: Used to acknowledge the final responses to an INVITE method. An ACK always goes in the direction of INVITE.

BYE: Used to terminate an established session. This is a SIP request that can be sent by either the caller or the callee to end a session.

CANCEL: Used to terminate a session that is not established. It is a Hop by Hop request. It can be sent by a user agent or proxy.

OPTIONS: Used to query a user agent or a proxy server about its capabilities and discover its current availability (HeartBeat)

REGISTER: Used to register a user’s current location.

INFO: Used for mid-session signaling for an established media session.

SUBSCRIBE: Used by UA to establish a subscription to get notifications about a particular event.

NOTIFY: Used by user agents to get the occurrence of a particular event.

PUBLISH: Used by a user agent to send event state information to a server.

SIP Call Flow

Before understanding the SIP Methods, first, let's focus on understanding the pictorial representation of the Basic SIP call flow. This is a typical case of SIP call flow.User1 uses a SIP phone or application to reach user2. Proxy servers help to set up the session on behalf of the users. This kind of arrangement of SIP proxy and end-users is called “SIP Trapezoid”.

The transaction starts when the user1 sending an INVITE request to user2. But user1 does not know the exact location of user2 in the IP network. So it passes the request to proxy server1. Proxy server1 on behalf of user1 forwards an INVITE request for user2 to proxy server2. It sends a TRYING response to the user1 indicating that it is trying to reach user2.

Upon receiving the INVITE M2 request proxy server2 forwards an INVITE request to user2 (Considering proxy server2 knows the location of user2, else it would have forwarded it to another proxy server. So an INVITE request travels through several proxies before reaching the endpoint). After forwarding INVITE M4 proxy server2 issues a TRYING response to server1. The user2 SIP phone, on receiving the INVITE request, starts ringing. It sends a RINGING response back to proxy server2 which backtracks all the way to reach the user1. So user1 gets feedback that user2 has received the INVITE request.

User2 at this point has a choice to accept or decline the call. As soon as user2 received the call, a 200 OK response is sent by phone to proxy server2. Retracing the route of INVITE, it reaches the user1. The SIP phone of user1 sends an ACK message to confirm the setup of the call. This 3-way-handshaking (INVITE+OK+ACK) is used for a reliable call setup. The ACK message is not using the proxies to reach user2 because user1 knows the exact location of the user2. Once the connection has been set up, media flows between the endpoints. Media is not controlled by SIP. When one party in the session decides to disconnect, it (user2 in the figure) sends a BYE message to the other party. The other party sends a 200 OK message to confirm the termination of the session.

SIP Request Message

RequestLine: Method SP Request-URI SP SIP-Version CRLF

Via: contains the local address, where it expects a response.

Max-forward: It limits the number of hops a request can make on the way to its destination. Decreases by one at each hop.

To: Contains display name “user2” and sip URI <user2@server2.com>

From: Contains a display name “user1” and sip URI <user1@server1.com>. and a tag that works as an identifier of the caller in the dialog.

Call-ID: Unique identifier of the call.

Cseq: Contains an integer (increment by one for each new message)and a method name.

Contact: SIP URI, which is a direct route to the user1.

Note: Via field is used to send the response to the request. Contact field is used to send future requests. That is why the 200 OK response from user2 goes to user1 through proxies. But when user2 generates a BYE request it goes directly to user1 without the proxies.

SIP Response Message

Status Line: SIP-Version SP Status-Code SP Reason-Phrase CRLF

Via: All the elements through which the request came will be tracked in Via (user1 SIP device, proxyserver1 & proxy server2).

To: Contains tag added by the receiver end

Contact: Contains the exact address of the user2 for direct communication.

Conclusion

I hope you all got a brief overview and idea about what SIP is and how does it work. There are more call scenarios and concepts in SIP. I would highly recommend going through RFC 3261 for understanding the concepts in depth.