VoiceBridge

Why Most “AI Voice for Asterisk” Solutions Fail at Full Duplex

MYLINEHUB Team • 2026-02-12 • 12 min

Most integrations break on duplex: RTP direction, timing drift, NAT, barge-in, and audio injection. Learn why—and how VoiceBridge avoids the traps.

Why Most “AI Voice for Asterisk” Solutions Fail at Full Duplex

Asterisk makes it look easy to build an “AI voice bot”: answer a call, record audio, run STT → LLM → TTS, then play audio back. In demos, this works. In production, it breaks — especially the moment you want true full duplex: the caller can speak while the bot is speaking, interruptions (“barge-in”) work, and the audio still stays stable under NAT, jitter, and real carrier RTP behavior.

This article explains the real reasons most integrations fail, and what a production-grade duplex bridge must do differently. We use the open-source reference implementation from MYLINEHUB VoiceBridge to show what “real” looks like: https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge.

If you want the full VoiceBridge architecture overview first, read: https://mylinehub.com/articles/mylinehub-voicebridge-architecture.

The “Demo Trap”: Why Full Duplex Looks Easy Until You Ship

The demo version of an AI call usually runs in a lab:

single server
no NAT (or predictable NAT)
clean LAN RTP
one call at a time
audio files or slow turn-based conversation

Full duplex dies in production because the system stops being “a script” and becomes a distributed real-time media pipeline:

RTP is continuous (every 20ms for G.711, often 10ms/20ms for PCM16 frames depending on processing)
two independent audio directions must remain stable (caller→bot, bot→caller)
the bot must stop talking the instant the caller interrupts (barge-in)
timing must be correct even under jitter and packet loss
ports, NAT, firewall policy, and session cleanup must work at scale

Most solutions fail because they solve “AI” but not “telecom media engineering”.

Failure #1: Turn-Based Architectures Pretend to Be “Voice AI”

The most common approach is still: record → transcribe → think → synthesize → play. This is not full duplex. It is an IVR with AI content.

Common examples:

AGI scripts
AMI-triggered audio playback flows
Dialplan macros that run external commands

These fail at full duplex because they are structurally blocking:

While Asterisk is playing your TTS file, your “bot” is not listening continuously.
While you are recording caller audio, the bot is not talking.
Interruptions become impossible (or extremely delayed), because audio is file-based and buffered.

VoiceBridge includes an explicit minimal AGI example mainly to show that AGI is not enough for duplex: MinimalAgiExample.java. True duplex requires RTP streaming + ARI media graph control, not turn-based blocking flows.

Failure #2: “We Receive RTP” Is Not the Same as “We Can Send RTP Back Correctly”

Many projects claim “full duplex” because they successfully receive RTP from Asterisk via ARI ExternalMedia. Receiving RTP is the easy half.

The hard half is sending RTP back to Asterisk in a way Asterisk will accept as real-time audio:

correct payload type for the negotiated codec
correct RTP timestamp increment per packet
correct sequence increment
correct pacing (20ms is not “sometimes 10ms then 80ms”)
stable SSRC behavior (or correct handling when SSRC changes)

Most “samples” inject audio like generic UDP data, with naive sleeps, or bursty timing. The result:

robotic audio
late packets discarded
drift (your timestamps run away from wall-clock time)
one direction works, other direction is silent

In VoiceBridge, the RTP correctness layer is explicit:

rtp/RtpPacketizer.java — constructs RTP packets with correct headers (sequence, timestamp, payload type, SSRC discipline).
queue/PlayoutScheduler.java — schedules outbound audio so packets are paced like real telephony, not burst output.
dsp/Pcm10msFramer.java — frames audio into consistent durations so timing math stays stable.

Failure #3: NAT and Firewall Reality Destroys “One UDP Port” Designs

Real deployments involve NAT, stateful firewalls, and asymmetric routing. Asterisk might send RTP from a different source port than you expect, especially through NAT. Many “AI voice” solutions hardcode RTP endpoints:

“Send audio to the IP/port from the ARI response.”
“Bind to a UDP port and hope the packets come back.”

Under symmetric NAT or strict firewall policy, the inbound RTP source may differ from the advertised tuple. This yields classic symptoms:

you can hear caller audio, but the caller cannot hear the bot
or audio works for 10–30 seconds then goes silent (NAT mapping changes)
or duplex works on LAN but fails on WAN

VoiceBridge explicitly handles this by learning and locking the symmetric RTP endpoint:

rtp/RtpSymmetricEndpoint.java — tracks the actual (IP, port) observed on inbound RTP and uses it for outbound RTP safely.

This is the difference between “works in a demo” and “works behind real routers”.

Failure #4: Incorrect ARI Media Graph (Bridge/Snoop/ExternalMedia Misuse)

Full duplex inside Asterisk is not “attach ExternalMedia and done”. The Asterisk media graph matters:

which channels are in which bridge
whether the bridge is mixing vs holding
whether you need snoop channels for read-only capture
how you avoid echo and talk/listen collisions

A common mistake is building a single bridge and assuming it will behave like a duplex media router. In practice, the most stable pattern is usually:

one leg dedicated to capture (caller → AI)
one leg dedicated to injection (AI → caller)
explicit lifecycle control during call start/stop

VoiceBridge’s ARI orchestration is implemented in:

ari/impl/AriBridgeImpl.java — constructs and manages the bridge topology required for duplex stability.
ari/impl/ExternalMediaManagerImpl.java — creates ExternalMedia channels with the correct parameters, and ensures they are attached correctly.
ari/AriWsClient.java — event-driven ARI control is required; polling-only approaches miss lifecycle timing.

If your ARI media graph is wrong, you will see “half duplex” behavior, echo-like artifacts, or silent legs.

Failure #5: Codec Handling and “Invisible” Transcoding Costs

Many teams start with “PCM16 everywhere” because it’s easiest for AI models. But the caller leg is usually telephony codecs:

PCMU/PCMA (G.711) for SIP trunks
Opus for WebRTC endpoints

The moment you mix:

PCMU inbound from Asterisk
PCM16 required by AI STT
PCM16 returned by AI TTS
PCMU required outbound back to Asterisk

…you are now doing real-time transcoding. Most “AI voice” scripts do it incorrectly or too slowly:

wrong sample rate assumptions
wrong frame boundaries (causes clicking / robotic sound)
buffering too much (latency explosion)
CPU spikes under concurrency

VoiceBridge has explicit codec and framing components:

audio/CodecFactory.java — chooses codec pipelines for a session.
audio/codec/PcMuCodec.java, audio/codec/PcMaCodec.java, audio/codec/OpusCodec.java — real codec implementations.
audio/resampler/FfmpegResampler.java — pragmatic resampling/transcoding when needed (with operational tradeoffs).

If you do not treat codec conversion as a first-class engineering concern, duplex fails under load (or sounds broken).

Failure #6: No Real Barge-In (Interruptions) Control

Full duplex is not just “two audio directions”. It also means:

the bot may be speaking
the caller starts speaking
the bot must stop speaking instantly
and the AI must receive the caller speech without being contaminated by the bot audio

Most systems fail here because they:

play long TTS audio without a stop mechanism
buffer too much audio in advance
have no “truncate” semantics toward the AI streaming engine

VoiceBridge implements barge-in as a control loop:

barge/BargeInController.java — decides when caller speech energy should cut through.
barge/AudioEnergy.java — tracks audio energy / thresholds for interruption detection.
ai/impl/OpenAiRealtimeTruncateManager.java — sends truncation signals to stop AI audio mid-stream cleanly.

Without this, you can’t deliver human-like conversational flow — you deliver “please wait for the bot to finish”.

Failure #7: No Session State Model (So Cleanup and Race Conditions Kill You)

Media systems are state machines. Calls end abruptly. Channels hang up. Bridges get destroyed. If your integration is “some scripts + UDP”, then over time you accumulate:

orphaned UDP sockets
port leaks (RTP ports never returned)
dangling ARI channels/bridges
threads blocked on I/O

Under concurrency, race conditions become common:

AI returns TTS after call already ended
you send RTP to a port now assigned to a different call
ARI sends an event after your script “thinks” it’s done

VoiceBridge treats the call as a first-class session object:

session/CallSession.java — holds the per-call state (RTP endpoints, codec, AI config, bridges, lifecycle flags).
session/CallSessionManager.java — creates, tracks, and reliably cleans sessions on hangup/failure.
rtp/RtpPortAllocator.java — manages RTP ports safely so concurrency does not corrupt calls.

Failure #8: “AI Integration” That Ignores Real-Time Constraints

Many AI integrations treat speech like files:

buffer 5–10 seconds, then send for STT
wait for full LLM response
generate entire TTS audio first
then play back

That approach guarantees non-natural conversation. Full duplex requires streaming:

stream caller audio continuously to STT/AI
receive partial response quickly
start speaking while AI is still generating
be able to stop instantly on interruption

VoiceBridge’s real-time AI layer is explicitly separated and implemented as streaming clients:

ai/RealtimeAiClient.java — contract for streaming interaction.
ai/impl/RealtimeAiClientImpl.java — production logic for streaming send/receive.
ai/AiClientFactory.java — per-session AI selection/config.

What Actually Works: The Minimum Requirements for True Duplex

If you want full duplex that survives production, your solution needs all of the following:

Correct RTP packetization (headers + pacing + timestamps)
Two-leg media model where capture and injection are controlled reliably
Symmetric RTP endpoint learning for NAT correctness
Codec discipline (real conversion, not assumptions)
Streaming AI integration (not file-based turns)
Barge-in controller (detect speech, truncate bot audio, resume listening)
Session lifecycle state machine (cleanup, retries, failure handling)
Scalability primitives (port allocation, concurrency, backpressure)

This is why “AI calling” isn’t a single feature — it’s a system.

Why VoiceBridge Doesn’t Fall Into These Traps

VoiceBridge was built specifically around the failure modes above. It is not “ARI + UDP”. It is a real media bridge with explicit modules for RTP correctness, NAT safety, barge-in, codec handling, and event-driven ARI session control.

If you want to explore the implementation, start with these core files:

ARI + bridges: AriBridgeImpl.java, ExternalMediaManagerImpl.java
RTP correctness + NAT: RtpPacketizer.java, RtpSymmetricEndpoint.java, RtpPortAllocator.java
Session lifecycle: CallSession.java, CallSessionManager.java
Barge-in + truncation: BargeInController.java, OpenAiRealtimeTruncateManager.java

Practical Checklist: How to Spot a “Fake Duplex” Solution

If someone claims they have “full duplex AI voice for Asterisk”, ask these questions:

Can the caller interrupt the bot, and does the bot stop within < 200ms?
Do they send outbound RTP with correct timestamps and stable pacing, or bursts?
Do they handle symmetric NAT by learning the inbound RTP source endpoint?
Do they support real concurrency with clean port allocation and cleanup?
Do they have a session state model, or is it “scripts and sockets”?
Can they show Wireshark traces proving both RTP directions are continuous and stable?

If the answers are vague, the solution is not production duplex.

Conclusion

Most “AI voice for Asterisk” attempts fail at full duplex because they solve the AI part and ignore the telecom part. Real duplex requires engineering RTP correctness, NAT survival, bridge topology, codec conversion, barge-in control, and lifecycle state — all at once, under load.

The open-source path that addresses these realities directly is MYLINEHUB VoiceBridge. Explore the project here: https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge.

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos

Tip: Comment “Try the bot” on our YouTube videos to see automation in action.

MYLINEHUB Team

Published: 2026-02-12

Quick feedback

Was this helpful? (Yes 0 • No 0)

Reaction

Comments (0)

Be the first to comment.