Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)
Why AGI is turn-based by design: buffering, blocking I/O, audio control limits, and why it cannot deliver true full-duplex conversational voice.
Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)
Many Asterisk-based AI calling systems start with AGI (Asterisk Gateway Interface). It is simple, familiar, and easy to attach to a dialplan. But the moment you try to build a natural, interruptible, real-time voice assistant, AGI becomes the bottleneck.
This is not a performance tuning issue. It is not a threading problem. It is not a scaling problem. It is a design limitation.
Real-time duplex voice requires continuous bidirectional RTP media control. AGI was never designed for that.
What Real-Time Duplex Actually Requires
A true duplex AI voice system must:
- Stream caller audio continuously to AI (not in chunks after playback).
- Generate AI speech while still receiving caller audio.
- Detect caller interruptions instantly (barge-in detection).
- Stop playback mid-frame without waiting for file completion.
- Maintain stable RTP timing (20ms cadence, monotonic timestamps).
In protocol terms, duplex means:
- Two active RTP directions.
- No blocking media primitives.
- Application-level control over bridge and media state.
AGI does not provide this control surface.
How AGI Actually Works
AGI is synchronous and command-based.
The call flow looks like this:
Caller → Dialplan → AGI script
AGI sends command → Asterisk executes → AGI waits
Each AGI command blocks until completion:
STREAM FILEblocks until playback finishes.RECORD FILEblocks until recording ends.GET DATAblocks waiting for input.
While one operation is running, AGI cannot simultaneously process another.
The Blocking Playback Problem
The biggest limitation: STREAM FILE is blocking.
When Asterisk plays audio through AGI:
- The channel enters playback state.
- AGI waits for playback to finish.
- No concurrent audio capture occurs.
That means:
- You cannot detect interruptions during playback.
- You cannot cut speech mid-stream reliably.
- You cannot stream AI-generated audio in small RTP frames.
The system becomes turn-based:
AI speaks → Caller waits
Caller speaks → AI waits
This is half-duplex, not real duplex.
No Direct RTP Access
Duplex voice requires tight control over RTP:
- Precise 20ms frame timing
- Stable sequence numbers
- Monotonic timestamp increments
- Payload type consistency
AGI does not expose RTP streams.
It only allows high-level media commands. You cannot inject raw RTP frames. You cannot read live RTP in parallel. You cannot control SSRC behavior.
Without RTP-level access, real duplex media engineering is impossible.
Why Barge-In Fails Under AGI
True conversational AI must support barge-in:
- User interrupts AI.
- AI stops speaking instantly.
- System switches to listening state immediately.
Under AGI:
- Playback blocks execution.
- Input detection happens only after playback completes.
- Speech overlap cannot be resolved cleanly.
The result is unnatural conversation flow.
Latency Amplification
AGI introduces additional latency layers:
- Dialplan → AGI process handoff
- STDIN/STDOUT command parsing
- Command execution round-trips
In AI voice systems, even 200–300ms delays degrade naturalness. AGI makes low-latency conversational timing extremely difficult.
Why ARI + ExternalMedia Solves This
Real duplex requires asynchronous media control.
ARI provides:
- Event-driven channel state control
- Bridge manipulation in real time
- ExternalMedia channels for raw RTP streaming
With ExternalMedia:
- Asterisk sends live RTP to your application.
- Your application sends RTP back concurrently.
- No blocking playback primitives are involved.
That enables:
- True bidirectional streaming
- Instant barge-in detection
- Frame-level playback control
- Stable RTP timing discipline
The Architectural Difference
AGI Model:
Caller ↔ Asterisk ↔ AGI (commands)
Media is controlled by blocking instructions.
ARI + ExternalMedia Model:
Caller ↔ Bridge ↔ ExternalMedia ↔ Application
Media flows continuously in both directions. Control is event-driven and asynchronous.
Production Reality
Many systems attempt to “extend” AGI for AI. They eventually hit these symptoms:
- One-way audio
- Playback delay
- Missed interruptions
- Conversation overlap artifacts
- High perceived latency
These are not bugs. They are architectural mismatches.
Final Conclusion
AGI was designed for scripted IVR logic, not real-time AI conversation.
Real-time duplex voice requires:
- Non-blocking media handling
- Bidirectional RTP control
- Bridge-level orchestration
- Event-driven state management
Those capabilities exist in ARI + ExternalMedia — not in AGI.
If your goal is a natural, interruptible, human-like AI conversation, AGI is structurally incapable of delivering it.
Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.
Comments (0)
Be the first to comment.