How VoiceBridge Achieves True Full-Duplex Audio in Production
What it takes to deliver true full-duplex AI voice: dual RTP legs, timing discipline, jitter handling, barge-in control, and safe Asterisk integration.
How VoiceBridge Achieves True Full-Duplex Audio in Production
True full-duplex AI voice means both sides of a conversation can speak at the same time — with real-time interruption detection, stable RTP timing, and no blocking media primitives.
Most Asterisk-based implementations fail at this because they use AGI or file-based playback. VoiceBridge achieves production-grade duplex using ARI + ExternalMedia + disciplined RTP engineering.
Source Repository:
https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge
Production Definition of Full-Duplex
In production telecom systems, full-duplex requires:
- Simultaneous inbound and outbound RTP streams
- No blocking playback operations
- Immediate barge-in detection (<100ms)
- Stable RTP cadence (20ms frame pacing)
- Correct SSRC and sequence discipline
- Bridge-level orchestration
VoiceBridge implements all of the above at the RTP layer, not just the application layer.
Core Architecture: ARI + Mixing Bridge + ExternalMedia
VoiceBridge does not rely on AGI. Instead it uses ARI to create and manage bridges dynamically.
Caller Channel
│
▼
Mixing Bridge (ARI-controlled)
│
├── ExternalMedia Channel (RTP out → AI)
└── Caller Audio (RTP in)
ARI control is implemented in:
-
AriBridgeImpl.java
src/main/java/com/mylinehub/voicebridge/ari/impl/AriBridgeImpl.java -
ExternalMediaManagerImpl.java
src/main/java/com/mylinehub/voicebridge/ari/impl/ExternalMediaManagerImpl.java
These components dynamically:
- Create mixing bridges
- Attach caller channel
- Attach ExternalMedia RTP endpoint
- Manage lifecycle events
RTP Discipline: The Foundation of Duplex Stability
Full-duplex is impossible without strict RTP engineering. VoiceBridge handles RTP generation internally instead of using file playback.
Core RTP components:
-
RtpPacketizer.java
src/main/java/com/mylinehub/voicebridge/rtp/RtpPacketizer.java -
RtpSymmetricEndpoint.java
src/main/java/com/mylinehub/voicebridge/rtp/RtpSymmetricEndpoint.java -
RtpPortAllocator.java
src/main/java/com/mylinehub/voicebridge/rtp/RtpPortAllocator.java
1. RtpPacketizer.java
This class constructs outbound RTP packets manually.
Responsibilities include:
- Maintaining sequence numbers
- Monotonic timestamp increment (160 per 20ms for 8kHz PCM)
- Payload encoding
- Consistent SSRC per session
This ensures Asterisk treats AI-generated audio as a valid, continuous stream.
2. RtpSymmetricEndpoint.java
Handles symmetric RTP behavior.
- Learns remote IP/port dynamically
- Maintains send/receive state
- Prevents one-way audio caused by NAT
This is critical in real-world deployments behind firewalls.
3. RtpPortAllocator.java
Production systems must avoid RTP port collisions.
This component:
- Allocates even RTP ports
- Ensures thread-safe reservation
- Prevents reuse conflicts under load
Why Mixing Bridges Enable True Duplex
A mixing bridge allows multiple media streams to exist simultaneously.
- Caller audio flows into bridge
- ExternalMedia audio flows into bridge
- Asterisk mixes both streams
Because no playback command is blocking, inbound audio continues even while outbound AI speech is transmitted.
How Barge-In Works in VoiceBridge
Since caller RTP is continuously streamed to the AI pipeline:
- Speech detection runs in parallel
- Interruption is detected instantly
- Outbound RTP stream can be halted mid-frame
There is no need to wait for file playback completion.
Production Timing Guarantees
VoiceBridge enforces:
- 20ms frame pacing
- Stable clock drift control
- Continuous timestamp increments
- Payload consistency
If these rules are violated, Asterisk produces jitter, silence, or dropouts. The internal RTP engine prevents these conditions.
Why This Works in Production (Not Just Lab)
Many demo systems appear duplex in a controlled LAN. They fail under:
- NAT environments
- High call concurrency
- Clock drift conditions
- Packet jitter
VoiceBridge handles:
- Symmetric RTP learning
- Port allocation scaling
- Bridge lifecycle cleanup
- Session-level SSRC isolation
Comparison: AGI vs VoiceBridge Duplex
| Capability | AGI | VoiceBridge |
|---|---|---|
| Simultaneous RTP | No | Yes |
| Barge-In | Delayed | Instant |
| Frame-Level Control | No | Yes |
| Bridge Orchestration | No | Full ARI Control |
Final Summary
VoiceBridge achieves real full-duplex audio not by clever scripting, but by respecting how RTP, Asterisk bridges, and media streams actually work.
It combines:
- ARI event-driven control
- Mixing bridges
- ExternalMedia RTP streaming
- Custom RTP packetization
- Symmetric endpoint handling
- Production-safe port allocation
That is why it works in production — not just in demos.
Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.
Comments (0)
Be the first to comment.