VoiceBridge

Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)

MYLINEHUB Team • 2026-02-19 • 10 min

Why AGI is turn-based by design: buffering, blocking I/O, audio control limits, and why it cannot deliver true full-duplex conversational voice.

Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)

Many Asterisk-based AI calling systems start with AGI (Asterisk Gateway Interface). It is simple, familiar, and easy to attach to a dialplan. But the moment you try to build a natural, interruptible, real-time voice assistant, AGI becomes the bottleneck.

This is not a performance tuning issue. It is not a threading problem. It is not a scaling problem. It is a design limitation.

Real-time duplex voice requires continuous bidirectional RTP media control. AGI was never designed for that.

What Real-Time Duplex Actually Requires

A true duplex AI voice system must:

Stream caller audio continuously to AI (not in chunks after playback).
Generate AI speech while still receiving caller audio.
Detect caller interruptions instantly (barge-in detection).
Stop playback mid-frame without waiting for file completion.
Maintain stable RTP timing (20ms cadence, monotonic timestamps).

In protocol terms, duplex means:

Two active RTP directions.
No blocking media primitives.
Application-level control over bridge and media state.

AGI does not provide this control surface.

How AGI Actually Works

AGI is synchronous and command-based.

The call flow looks like this:


Caller → Dialplan → AGI script
AGI sends command → Asterisk executes → AGI waits

Each AGI command blocks until completion:

STREAM FILE blocks until playback finishes.
RECORD FILE blocks until recording ends.
GET DATA blocks waiting for input.

While one operation is running, AGI cannot simultaneously process another.

The Blocking Playback Problem

The biggest limitation: STREAM FILE is blocking.

When Asterisk plays audio through AGI:

The channel enters playback state.
AGI waits for playback to finish.
No concurrent audio capture occurs.

That means:

You cannot detect interruptions during playback.
You cannot cut speech mid-stream reliably.
You cannot stream AI-generated audio in small RTP frames.

The system becomes turn-based:


AI speaks → Caller waits
Caller speaks → AI waits

This is half-duplex, not real duplex.

No Direct RTP Access

Duplex voice requires tight control over RTP:

Precise 20ms frame timing
Stable sequence numbers
Monotonic timestamp increments
Payload type consistency

AGI does not expose RTP streams.

It only allows high-level media commands. You cannot inject raw RTP frames. You cannot read live RTP in parallel. You cannot control SSRC behavior.

Without RTP-level access, real duplex media engineering is impossible.

Why Barge-In Fails Under AGI

True conversational AI must support barge-in:

User interrupts AI.
AI stops speaking instantly.
System switches to listening state immediately.

Under AGI:

Playback blocks execution.
Input detection happens only after playback completes.
Speech overlap cannot be resolved cleanly.

The result is unnatural conversation flow.

Latency Amplification

AGI introduces additional latency layers:

Dialplan → AGI process handoff
STDIN/STDOUT command parsing
Command execution round-trips

In AI voice systems, even 200–300ms delays degrade naturalness. AGI makes low-latency conversational timing extremely difficult.

Why ARI + ExternalMedia Solves This

Real duplex requires asynchronous media control.

ARI provides:

Event-driven channel state control
Bridge manipulation in real time
ExternalMedia channels for raw RTP streaming

With ExternalMedia:

Asterisk sends live RTP to your application.
Your application sends RTP back concurrently.
No blocking playback primitives are involved.

That enables:

True bidirectional streaming
Instant barge-in detection
Frame-level playback control
Stable RTP timing discipline

The Architectural Difference

AGI Model:


Caller ↔ Asterisk ↔ AGI (commands)

Media is controlled by blocking instructions.

ARI + ExternalMedia Model:


Caller ↔ Bridge ↔ ExternalMedia ↔ Application

Media flows continuously in both directions. Control is event-driven and asynchronous.

Production Reality

Many systems attempt to “extend” AGI for AI. They eventually hit these symptoms:

One-way audio
Playback delay
Missed interruptions
Conversation overlap artifacts
High perceived latency

These are not bugs. They are architectural mismatches.

Final Conclusion

AGI was designed for scripted IVR logic, not real-time AI conversation.

Real-time duplex voice requires:

Non-blocking media handling
Bidirectional RTP control
Bridge-level orchestration
Event-driven state management

Those capabilities exist in ARI + ExternalMedia — not in AGI.

If your goal is a natural, interruptible, human-like AI conversation, AGI is structurally incapable of delivering it.

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos

Tip: Comment “Try the bot” on our YouTube videos to see automation in action.

MYLINEHUB Team

Published: 2026-02-19

Quick feedback

Was this helpful? (Yes 0 • No 0)

Reaction

Comments (0)

Be the first to comment.