VoiceBridge

Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)

MYLINEHUB Team • 2026-02-19 • 10 min

Why AGI is turn-based by design: buffering, blocking I/O, audio control limits, and why it cannot deliver true full-duplex conversational voice.

Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)

Why AGI Cannot Provide Real-Time Duplex Voice (Technical Limits)

Many Asterisk-based AI calling systems start with AGI (Asterisk Gateway Interface). It is simple, familiar, and easy to attach to a dialplan. But the moment you try to build a natural, interruptible, real-time voice assistant, AGI becomes the bottleneck.

This is not a performance tuning issue. It is not a threading problem. It is not a scaling problem. It is a design limitation.

Real-time duplex voice requires continuous bidirectional RTP media control. AGI was never designed for that.

What Real-Time Duplex Actually Requires

A true duplex AI voice system must:

  • Stream caller audio continuously to AI (not in chunks after playback).
  • Generate AI speech while still receiving caller audio.
  • Detect caller interruptions instantly (barge-in detection).
  • Stop playback mid-frame without waiting for file completion.
  • Maintain stable RTP timing (20ms cadence, monotonic timestamps).

In protocol terms, duplex means:

  • Two active RTP directions.
  • No blocking media primitives.
  • Application-level control over bridge and media state.

AGI does not provide this control surface.

How AGI Actually Works

AGI is synchronous and command-based.

The call flow looks like this:


Caller → Dialplan → AGI script
AGI sends command → Asterisk executes → AGI waits
  

Each AGI command blocks until completion:

  • STREAM FILE blocks until playback finishes.
  • RECORD FILE blocks until recording ends.
  • GET DATA blocks waiting for input.

While one operation is running, AGI cannot simultaneously process another.

The Blocking Playback Problem

The biggest limitation: STREAM FILE is blocking.

When Asterisk plays audio through AGI:

  • The channel enters playback state.
  • AGI waits for playback to finish.
  • No concurrent audio capture occurs.

That means:

  • You cannot detect interruptions during playback.
  • You cannot cut speech mid-stream reliably.
  • You cannot stream AI-generated audio in small RTP frames.

The system becomes turn-based:


AI speaks → Caller waits
Caller speaks → AI waits
  

This is half-duplex, not real duplex.

No Direct RTP Access

Duplex voice requires tight control over RTP:

  • Precise 20ms frame timing
  • Stable sequence numbers
  • Monotonic timestamp increments
  • Payload type consistency

AGI does not expose RTP streams.

It only allows high-level media commands. You cannot inject raw RTP frames. You cannot read live RTP in parallel. You cannot control SSRC behavior.

Without RTP-level access, real duplex media engineering is impossible.

Why Barge-In Fails Under AGI

True conversational AI must support barge-in:

  • User interrupts AI.
  • AI stops speaking instantly.
  • System switches to listening state immediately.

Under AGI:

  • Playback blocks execution.
  • Input detection happens only after playback completes.
  • Speech overlap cannot be resolved cleanly.

The result is unnatural conversation flow.

Latency Amplification

AGI introduces additional latency layers:

  • Dialplan → AGI process handoff
  • STDIN/STDOUT command parsing
  • Command execution round-trips

In AI voice systems, even 200–300ms delays degrade naturalness. AGI makes low-latency conversational timing extremely difficult.

Why ARI + ExternalMedia Solves This

Real duplex requires asynchronous media control.

ARI provides:

  • Event-driven channel state control
  • Bridge manipulation in real time
  • ExternalMedia channels for raw RTP streaming

With ExternalMedia:

  • Asterisk sends live RTP to your application.
  • Your application sends RTP back concurrently.
  • No blocking playback primitives are involved.

That enables:

  • True bidirectional streaming
  • Instant barge-in detection
  • Frame-level playback control
  • Stable RTP timing discipline

The Architectural Difference

AGI Model:


Caller ↔ Asterisk ↔ AGI (commands)
  

Media is controlled by blocking instructions.

ARI + ExternalMedia Model:


Caller ↔ Bridge ↔ ExternalMedia ↔ Application
  

Media flows continuously in both directions. Control is event-driven and asynchronous.

Production Reality

Many systems attempt to “extend” AGI for AI. They eventually hit these symptoms:

  • One-way audio
  • Playback delay
  • Missed interruptions
  • Conversation overlap artifacts
  • High perceived latency

These are not bugs. They are architectural mismatches.

Final Conclusion

AGI was designed for scripted IVR logic, not real-time AI conversation.

Real-time duplex voice requires:

  • Non-blocking media handling
  • Bidirectional RTP control
  • Bridge-level orchestration
  • Event-driven state management

Those capabilities exist in ARI + ExternalMedia — not in AGI.

If your goal is a natural, interruptible, human-like AI conversation, AGI is structurally incapable of delivering it.

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos
Tip: Comment “Try the bot” on our YouTube videos to see automation in action.
M
MYLINEHUB Team
Published: 2026-02-19
Quick feedback
Was this helpful? (Yes 0 • No 0)
Reaction

Comments (0)

Be the first to comment.