Telecom

SIP vs SDP vs RTP: What Each Protocol Actually Does in a Call

MYLINEHUB Team • 2026-03-10 • 11 min

Confused between SIP, SDP, and RTP? Learn what each one does, how they work together during a call, and why understanding the difference helps in VoIP design and troubleshooting.

SIP vs SDP vs RTP: What Each Protocol Actually Does in a Call

SIP vs SDP vs RTP: What Each One Actually Does in a Call

In telecom, people often hear SIP, SDP, and RTP together and assume they are all doing the same job. They are not. They work together, but each one has a very different responsibility.

This article is written to take the reader from beginner → practical engineer → advanced troubleshooting mindset. We start with a very simple mental model, then move into call flow, structure, negotiation, packet behavior, troubleshooting, and real-world failures.

What you will understand by the end:

• what SIP really does
• what SDP really does
• what RTP really does
• how they appear in a real call flow
• which one carries media and which one does not
• how signaling, description, and packet transport are different layers of the story
• why troubleshooting gets much easier once you separate these clearly

Beginner-friendly Deeper examples Mobile friendly Troubleshooting-focused

🌐 Start with HTTP so the difference becomes obvious

Think about how a normal website works.

A browser sends an HTTP request to a server. The server sends an HTTP response back. That pattern is mostly about asking for resources and receiving data.

But a phone call or real-time voice session needs something more structured than “give me data”. A call needs:

• a way to start and control the session
• a way to describe how media should work
• a way to carry the actual media in real time

That is exactly where these three fit:

SIP → starts / manages / ends the call
SDP → describes how media should be exchanged
RTP → carries the actual audio/video packets

🧭 The shortest correct explanation

SIP Call signaling Start / modify / end session SDP Media description Codec / IP / port / direction RTP Media transport Actual voice/video packets Memory trick SIP says “let’s have a call.” SDP says “here is how the media should work.” RTP says “here are the actual media packets.”

⚡ One-line difference before we go deeper

SIP is about the session conversation.

SDP is about the media agreement.

RTP is about the real-time media stream itself.

📞 What SIP actually does

SIP = Session Initiation Protocol.

SIP is a signaling protocol. Its job is not to carry the actual voice. Its job is to control the session.

SIP typically handles:

• finding or reaching the destination
• ringing the destination
• accepting or rejecting the call
• creating the dialog/session state
• putting the call on hold
• transferring the call
• ending the call

Common SIP methods:

Method Purpose Simple meaning
INVITE Start or modify a session “Let’s begin a call.”
ACK Confirm final response “I got your answer.”
BYE Terminate session “The call is over.”
CANCEL Cancel before answer “Stop trying to connect.”
OPTIONS Capability check “What can you do?”
REGISTER Register endpoint “I’m here and reachable.”

The key point is: SIP manages the conversation around the call, not the voice stream itself.

📄 What SDP actually does

SDP = Session Description Protocol.

In practice, think of SDP as: a text-based description of the media rules for the session.

SDP usually tells the other side:

• what media type exists: audio, video, application
• which codecs are supported
• which IP address to send media to
• which port to send media to
• whether the stream is sendrecv, sendonly, recvonly, or inactive
• in advanced systems, DTLS / ICE / fingerprint / BUNDLE / mux details

Important: SDP often travels inside SIP, but SDP is not SIP itself. It is the media description payload that SIP may carry.

Easy memory line: SIP says there will be a call. SDP explains how the media should work inside that call.

🎙️ What RTP actually does

RTP = Real-time Transport Protocol.

RTP carries the actual media stream. If two people are talking, RTP is the protocol that usually transports the encoded audio in real time.

RTP packets usually include:

• payload type
• sequence number
• timestamp
• SSRC
• media payload data

So when someone says “the call is connected but no audio is coming”, very often SIP worked, SDP partly worked, but RTP is not flowing correctly — or SDP described the RTP path incorrectly.

🆚 Quick comparison table

Item Main job Human analogy Carries voice? Typical example
SIP Signal and manage the session The call-control conversation No INVITE, 180 Ringing, 200 OK, BYE
SDP Describe media rules The agreement sheet No Codec list, IP, port, direction
RTP Carry real-time media packets The actual sound/video stream Yes Voice frames every 20 ms

🔄 Real call flow: how SIP, SDP, and RTP appear in order

Caller Receiver SIP INVITE + SDP Offer 180 Ringing 200 OK + SDP Answer ACK RTP / SRTP Media Flow SIP BYE SIP controls the session. SDP negotiates media details. RTP carries live media after negotiation.

📦 A real SIP message carrying SDP

This is the exact point where many people confuse SIP and SDP. Look closely: SIP is the outer message. SDP is the body inside it.

INVITE sip:1002@pbx.example.com SIP/2.0
Via: SIP/2.0/UDP 192.168.1.10:5060;branch=z9hG4bK-12345
From: "Alice" <sip:1001@pbx.example.com>;tag=abc123
To: <sip:1002@pbx.example.com>
Call-ID: 9f8e7d6c@example.com
CSeq: 1 INVITE
Contact: <sip:1001@192.168.1.10:5060>
Content-Type: application/sdp
Content-Length: 245

v=0
o=- 3747 3747 IN IP4 192.168.1.10
s=VoIP Call
c=IN IP4 192.168.1.10
t=0 0
m=audio 49170 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=sendrecv

INVITE / Via / From / To / Call-ID / CSeq = SIP
v=0 / c= / m=audio / a=rtpmap / a=sendrecv = SDP

🧾 What each one looks like structurally

Item Format style Typical content Human-readable? Carries media?
SIP Request / response message INVITE, 200 OK, BYE, headers, routing info Yes No
SDP Text session description body codec list, port, IP, direction, ICE, fingerprint Yes No
RTP Packet stream timestamps, sequence numbers, SSRC, payload Usually no, not in raw packet form Yes

🧠 What each one is responsible for

✅ SIP responsibilities

• locate or reach destination
• create session dialog
• ring / answer / cancel / terminate
• handle mid-call signaling like hold, transfer, update

✅ SDP responsibilities

• advertise supported codecs
• advertise media destination IP and port
• define media direction
• define media transport profile
• describe secure-media and ICE details in advanced systems

✅ RTP responsibilities

• carry actual encoded voice or video samples
• keep sequencing for real-time delivery
• provide timestamps for playback timing
• identify synchronization source

📍 Where each one sits in the stack

Application Layer SIP Signaling protocol SDP Session/media description HTTP / WS Can also carry SDP in some systems Transport Layer UDP Common for SIP / RTP / RTCP TCP / TLS Common for SIP-TCP, HTTPS, WSS SDP is not a media packet stream like RTP. RTP uses negotiated paths after signaling/description are done.

🎯 A simple real-world example

Imagine Alice calls Bob.

Step 1 — SIP: Alice’s phone sends an INVITE to Bob’s side. This is the “I want to start a call” part.

Step 2 — SDP: Inside that signaling exchange, Alice says: “I support Opus and PCMU. Send audio to this IP and port.”

Step 3 — SDP answer: Bob replies: “I accept PCMU. Send audio to my IP and port.”

Step 4 — RTP: Once negotiation is done, the actual audio packets start flowing. That is the real media stream.

So if the user says “the phone rang and connected, but I heard nothing,” the session part succeeded — but the media part did not.

🧩 How offer/answer makes SIP, SDP, and RTP work together

A very important deeper idea is that SDP usually works in offer/answer style.

Phase What happens Example
Offer One side describes what it can do “I support Opus, PCMU, PCMA. Send to IP:port A.”
Answer Other side accepts a compatible subset “I accept PCMU. Send to IP:port B.”
RTP flow Media begins using negotiated details Audio packets start using the agreed codec/path

That means RTP does not magically decide where to go. It follows the result of the negotiated session description.

🔢 Example: codec negotiation in plain language

Suppose Alice offers:

m=audio 49170 RTP/AVP 111 0 8
a=rtpmap:111 opus/48000/2
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000

Bob only supports:

m=audio 52000 RTP/AVP 0
a=rtpmap:0 PCMU/8000

The result is usually PCMU. Not because PCMU is “better”, but because it is the codec that both sides can use.

🧪 RTP packet example: this is very different from SIP/SDP

SIP and SDP are text-heavy and human-readable. RTP is packet-oriented and built for real-time delivery.

RTP Header (conceptual)

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       Sequence Number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Synchronization Source (SSRC) Identifier            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Media Payload ...                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Notice the difference: SIP and SDP help organize the session, while RTP is already at the per-packet media delivery level.

🧠 Why SIP success does not guarantee media success

This is one of the most important real-world lessons.

A SIP dialog can be completely successful:

• INVITE sent
• 180 Ringing received
• 200 OK received
• ACK sent

And yet the user may still hear: no audio, one-way audio, broken hold, bad codec behavior, or WebRTC connection issues.

Why? Because signaling success is not the same thing as media success. The media path still has to be negotiated correctly and then actually work.

🌍 Common failure examples

Example 1: Call rings, answers, but no audio

SIP worked. SDP may have advertised a private or unreachable IP address. RTP went to the wrong place.

Example 2: One-way audio

One side’s SDP answer may contain the wrong return address or blocked port. One RTP direction works, the other does not.

Example 3: Call connects but wrong codec is used

SIP is fine, but codec negotiation in SDP selected something unexpected, or transcoding logic changed behavior.

Example 4: Hold/resume behaves strangely

A later SDP exchange may have changed direction attributes to sendonly, recvonly, or inactive.

🔐 How this changes in WebRTC

In WebRTC, the same logic still applies:

• some signaling channel starts the session logic
• SDP describes codecs, ICE, DTLS fingerprints, media sections, directions
• SRTP carries the actual secure media

The big difference is that SIP may not be present at all. WebRTC can use a custom signaling method like WebSocket, HTTP API, Socket.IO, or another application channel.

Advanced but important lesson: SIP is one common signaling protocol, but SDP is not tied only to SIP. SDP can be exchanged through other signaling systems too.

⚠️ Common confusion points

❌ Mistake 1: “SIP carries audio”

No. SIP usually does not carry the audio stream. RTP or SRTP does.

❌ Mistake 2: “SDP is the same as SIP”

No. SDP is often inside SIP, but it is a different format with a different job.

❌ Mistake 3: “If SIP succeeds, media is guaranteed”

No. SIP can succeed while RTP fails because of bad SDP, NAT, firewall, blocked ports, or codec mismatch.

❌ Mistake 4: “RTP decides codecs”

No. Codec agreement is usually described in SDP first. RTP then carries the chosen codec payload.

❌ Mistake 5: “SDP is only for SIP phones”

No. WebRTC also depends heavily on SDP.

🛠️ Troubleshooting map: if something breaks, where should you look?

Call not ringing Check SIP signaling first Call connects, no audio Check SDP + RTP path One-way audio Check SDP IP/port/NAT Codec failure Check SDP codec intersection Hold / direction issue Check SDP sendonly/recvonly Media jitter / gaps Check RTP / RTCP behavior Best mindset: separate signaling problems from media problems.

🧬 Beginner → advanced understanding ladder

Beginner level

SIP starts the call. SDP describes the media. RTP carries the voice.

Intermediate level

SIP messages such as INVITE and 200 OK often include SDP. That SDP decides codec, IP, port, and direction. Once both sides agree, RTP starts flowing on the negotiated path.

Advanced level

SIP is signaling. SDP is session/media description. RTP is real-time media transport. Many telecom failures happen because signaling succeeds while media negotiation or transport fails due to codec mismatch, NAT rewriting, incorrect advertised addresses, blocked RTP ports, wrong media direction, or secure-media negotiation issues.

Expert mindset

Never stop at “the SIP call connected”. Always ask: what was negotiated in SDP, which codec/path was actually selected, and did RTP really flow correctly in both directions?

❓ Quick FAQ

Is SIP more important than RTP?

They solve different problems. Without SIP-like signaling, sessions are hard to manage. Without RTP, there is no real-time media stream.

Can RTP exist without SIP?

Yes. RTP is a transport format for media. SIP is one signaling system, not the only one.

Can SDP exist without SIP?

Yes. WebRTC commonly exchanges SDP over custom signaling channels such as WebSocket or HTTP APIs.

Why do engineers inspect SDP so much?

Because codec selection, destination IP/port, direction, and secure-media details often explain why media succeeds or fails.

✅ Final takeaway

If you remember only one thing, remember this:

SIP is the conversation about the call.
SDP is the agreement about the media.
RTP is the real media stream itself.

Once you separate those three mentally, VoIP becomes much easier to design, debug, explain, and optimize. Many confusing telecom problems stop looking random and start looking traceable. 🎯

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos
Tip: Comment “Try the bot” on our YouTube videos to see automation in action.
M
MYLINEHUB Team
Published: 2026-03-10
Quick feedback
Was this helpful? (Yes 0 • No 0)
Reaction

Comments (0)

Be the first to comment.