VoiceBridge vs AGI-Based Voice Bots — Turn-Based vs Real-Time
Why AGI bots feel like IVR and why VoiceBridge enables real conversational duplex voice, including barge-in and streaming responses.
VoiceBridge vs AGI-Based Voice Bots — Turn-Based vs Real-Time Duplex
Many Asterisk voice bots start with AGI (Asterisk Gateway Interface) because it’s simple: execute script → play prompt → record audio → send to STT → generate response → play audio → repeat.
That model works for IVR-style automation. It does not produce real-time, interruption-capable, conversational duplex AI.
This article explains the architectural difference between:
- AGI-based, turn-based voice bots
- VoiceBridge (ARI + ExternalMedia + RTP duplex engine)
VoiceBridge repository:
https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge
The Core Architectural Difference
AGI Model (Turn-Based)
AGI works as a blocking script execution model inside the dialplan.
- Dialplan calls AGI script
- Script plays audio file
- Script records caller input
- Script sends audio to STT
- Script generates reply
- Script plays next audio file
The call alternates between: bot speaking → caller speaking.
This is fundamentally half-duplex.
VoiceBridge Model (Real-Time Duplex)
VoiceBridge does not use AGI for media. It uses:
- ARI for event-driven call control
- ExternalMedia for RTP capture and injection
- A dedicated RTP engine for timing correctness
The caller and bot can speak simultaneously. Audio flows continuously in both directions.
Why AGI Is Structurally Turn-Based
AGI executes synchronously inside the dialplan.
Playback()blocks until audio finishesRecord()blocks until silence or timeout- No continuous RTP streaming
During playback:
- Caller interruption is delayed
- Barge-in requires hacky DTMF or polling tricks
- Speech overlap is not naturally supported
Even FastAGI does not change the media blocking model.
VoiceBridge Duplex Media Architecture
VoiceBridge constructs a real media graph using ARI:
-
ari/impl/AriBridgeImpl.java— builds mixing bridges -
ari/impl/ExternalMediaManagerImpl.java— creates RTP channels
RTP capture and injection are handled in:
rtp/RtpPacketizer.javartp/RtpSymmetricEndpoint.javartp/RtpPortAllocator.java
Audio is continuously streamed both directions.
Latency Comparison
AGI
- Playback must finish before recording
- STT only starts after recording stops
- Full round-trip delay per turn
This creates noticeable conversational lag.
VoiceBridge
- Caller audio streamed continuously
- AI processes in parallel
- TTS streamed back while caller still active
- Immediate truncation on interruption
Barge-in control implemented in:
ai/impl/OpenAiRealtimeTruncateManager.java
Barge-In Capability
AGI
- No natural barge-in
- Requires DTMF detection tricks
- Playback usually must finish
VoiceBridge
- Speech detection runs continuously
- Outbound RTP stream can be stopped mid-frame
- Bot audio truncation is deterministic
Controlled by:
ai/TruncateManager.java
Media Quality and Timing
AGI
- Relies on file playback
- No control over RTP pacing
- Limited real-time adjustments
VoiceBridge
- Strict RTP timestamp control
- Sequence integrity maintained
- SSRC stability enforced
Implemented in:
rtp/RtpPacketizer.java
Scalability Differences
AGI Scaling Characteristics
- Each call blocks script execution
- Heavy process/thread usage under load
- Limited observability into media layer
VoiceBridge Scaling Characteristics
- Per-call session object:
session/CallSession.java - Deterministic RTP port allocation
- Containerized deployment supported:
docker/Dockerfile,docker-compose.yml - Kubernetes-ready scaling (horizontal replicas)
Operational Stability
AGI Risks
- Script crash ends call abruptly
- Blocking I/O delays entire call flow
- Difficult RTP-level debugging
VoiceBridge Stability
- Explicit session lifecycle management
- Graceful cleanup on hangup
- Separation of control plane (ARI) and media plane (RTP)
When AGI Is Still Appropriate
- Menu-based IVR systems
- DTMF-driven automation
- Simple turn-based bots
If your goal is structured prompts and short responses, AGI is perfectly fine.
When VoiceBridge Is Required
- Natural conversational AI
- Interruption support (barge-in)
- Simultaneous speak/listen
- Low-latency real-time dialog
- Hundreds of concurrent AI calls
Final Comparison Summary
| Feature | AGI Bot | VoiceBridge |
|---|---|---|
| Duplex Audio | No (Turn-Based) | Yes (Real-Time) |
| Barge-In | Limited | Immediate |
| RTP Control | None | Full Control |
| Scalability | Script-bound | Service-Oriented |
| Latency Model | Per-Turn | Continuous |
AGI is ideal for traditional IVR logic. VoiceBridge is built for modern conversational AI where real-time duplex behavior is mandatory.
Repo:
https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge
Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.
Comments (0)
Be the first to comment.