VoiceBridge

Scaling VoiceBridge for Hundreds of Concurrent Calls

MYLINEHUB Team • 2026-02-10 • 13 min

Scaling patterns for VoiceBridge: CPU/RTP costs, horizontal scaling, port planning, and how to run hundreds of duplex AI calls reliably.

Scaling VoiceBridge for Hundreds of Concurrent Calls

Scaling VoiceBridge for Hundreds of Concurrent Calls: What Actually Scales

When you move from 20 calls to 200+ calls, the bottleneck is usually not Asterisk. Asterisk/FreePBX is the call-control and SIP termination layer. The component that becomes expensive in AI calling is the always-on media + AI streaming pipeline — which is exactly what VoiceBridge owns.

VoiceBridge is a Java real-time media service that maintains per-call RTP endpoints and per-call AI streaming sessions. That makes it scalable like other backend services (horizontal replicas, container orchestration, autoscaling) — but with one major twist: RTP is UDP, and UDP needs deliberate networking and port planning.

Repo reference (all file paths in this article live here):
https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge

Canonical architecture background:
https://mylinehub.com/articles/mylinehub-voicebridge-architecture

The Scaling Unit Is Not “Asterisk Calls” — It’s “VoiceBridge Sessions”

VoiceBridge is not a stateless HTTP service. Each call is a long-lived session with:

  • ARI channel lifecycle events (start, hangup, bridge changes, etc.)
  • At least one bound UDP socket for RTP (often two legs for duplex media)
  • A persistent AI streaming connection (typically WebSocket per call)
  • Barge-in / truncation state and timing discipline
  • Optional recording and metrics pipelines

That “session state” is why scaling VoiceBridge is fundamentally a horizontal scaling problem: you scale by running more VoiceBridge instances, each instance owning a subset of active calls.

In code, this session-centric architecture is visible via:

  • src/main/java/com/mylinehub/voicebridge/session/CallSession.java (the conceptual per-call container: RTP endpoints, AI client, state, correlation IDs)
  • src/main/java/com/mylinehub/voicebridge/ari/impl/AriBridgeImpl.java (builds the call media graph and maintains the duplex bridge wiring)
  • src/main/java/com/mylinehub/voicebridge/ari/impl/ExternalMediaManagerImpl.java (creates and manages ExternalMedia channels and RTP legs)

What Limits You First: UDP Ports, CPU per Call, or AI Latency?

In real deployments, “hundreds of calls” fails for one of these reasons:

  • RTP port exhaustion (not enough UDP ports allocated per instance, or firewall/container does not expose the range)
  • CPU saturation (codec conversion, resampling, packet pacing, encryption, JSON parsing for AI streams)
  • Network saturation (RTP bandwidth + AI streaming bandwidth + recording uploads)
  • AI backend latency spikes (STT/TTS + streaming response time increases under load)
  • GC / memory pressure (buffer churn from audio frames, per-call objects, logging overhead)

The correct scaling approach is to treat VoiceBridge like a performance-sensitive media server: plan ports, pin sessions, cap per-instance concurrency, and scale replicas horizontally.

RTP Port Planning: The Non-Negotiable Foundation

VoiceBridge binds UDP ports locally and uses those ports for RTP exchange with Asterisk. If you don’t plan ports correctly, your “scaling” fails before it starts.

The port strategy is implemented in:

  • src/main/java/com/mylinehub/voicebridge/rtp/RtpPortAllocator.java (allocates ports per call/leg per instance; dictates how many calls one instance can safely host)
  • src/main/resources/application.properties (contains rtp.bind.port and server bind port defaults)

How to think about “ports per call”

Full duplex in production typically means at least two RTP directions that must remain live: caller → VoiceBridge and VoiceBridge → Asterisk (AI voice injection). Even if some modes reuse channels, your safe planning model is:

  • 2 UDP ports per call minimum (one per RTP leg)
  • + headroom for retries, re-invites, hangup races, port collisions

Example: If one VoiceBridge instance is meant to handle 200 concurrent calls safely, you should expect needing ~400+ UDP ports available to that instance depending on your leg design.

Container consequence: you must expose a UDP range, not a single port

The repository contains container scaffolding:

  • docker/Dockerfile
  • docker-compose.yml
  • .env.example (shows runtime env pattern like RTP_BIND_PORT)

A demo compose file may map a single RTP port for testing. That is not enough for scale. At scale, you must publish (or allow) the full UDP port range that VoiceBridge will allocate from.

RTP Under NAT and Firewalls: Scaling Without Breaking Duplex

When you run multiple VoiceBridge instances, you increase the number of RTP endpoints and UDP sockets. That increases the probability of NAT and firewall mistakes causing one-way audio or “random failures under load”.

The NAT-safe behavior is built into:

  • src/main/java/com/mylinehub/voicebridge/rtp/RtpSymmetricEndpoint.java (learns the actual remote RTP endpoint based on received packets; crucial for symmetric NAT environments)
  • src/main/java/com/mylinehub/voicebridge/rtp/RtpPacketizer.java (strict RTP headers + timestamp/sequence pacing; prevents jitter-buffer collapse on busy systems)

Scaling principle: Do not widen firewall rules as you scale. Instead, add replicas and keep each replica’s RTP range tight and source-restricted.

CPU Model: What Actually Burns CPU per Call in VoiceBridge

1) Packet processing + timing discipline

Every active call generates a steady flow of RTP packets. VoiceBridge must parse, route, and repacketize audio while maintaining correct timing. This work grows linearly with call count.

Implementation reference: src/main/java/com/mylinehub/voicebridge/rtp/RtpPacketizer.java

2) Codec conversion and resampling

Telephony often uses G.711 (PCMU/PCMA). AI engines commonly prefer PCM16. Conversion + resampling can dominate CPU under load.

Practical repo reference: src/main/java/com/mylinehub/voicebridge/audio/resampler/FfmpegResampler.java (shows that the pipeline must handle real conversion workflows; conversion cost must be accounted for in sizing)

3) AI streaming concurrency

Each call may keep a persistent streaming session open to the AI backend. That means: sockets, buffers, JSON parsing, event-driven audio frame management — continuously.

  • src/main/java/com/mylinehub/voicebridge/ai/impl/RealtimeAiClientImpl.java
  • src/main/java/com/mylinehub/voicebridge/ai/impl/GoogleLiveAiClientImpl.java
  • src/main/java/com/mylinehub/voicebridge/ai/impl/ExternalBotWsClientImpl.java

4) Barge-in (interrupt handling)

Barge-in is computationally expensive because you must: detect interruptions quickly, truncate AI speech, and keep RTP continuity stable.

  • src/main/java/com/mylinehub/voicebridge/ai/impl/OpenAiRealtimeTruncateManager.java

Scaling conclusion: capacity is not “calls per second”. It is “steady per-call CPU” + “steady per-call IO”. So you scale VoiceBridge exactly like a media worker pool: cap per pod, add pods.

The Only Scaling Strategy That Works: Horizontal Replicas with Call Affinity

A call cannot bounce between VoiceBridge replicas mid-session because:

  • RTP sockets are bound on a specific instance
  • Symmetric endpoint learning is instance-local
  • AI streaming sessions are instance-local
  • Barge-in and truncation state is instance-local

Therefore, the key scaling rule is: once a call is assigned to a VoiceBridge instance, it must remain pinned there until hangup.

This is “session affinity”, but for RTP/UDP + WebSocket media, not for browser cookies.

Containerization: How to Run VoiceBridge as a Proper Java Service

VoiceBridge is already container-friendly by design (Spring Boot service style). The repository includes:

  • docker/Dockerfile (image build for the VoiceBridge module)
  • docker-compose.yml (local orchestration pattern)
  • .env.example (runtime env variables pattern)
  • src/main/resources/application.properties (runtime knobs like server port and RTP bind port)

Production container rules

  • Always set JVM memory explicitly (avoid “it worked on my laptop” GC surprises).
  • Mount configuration via env vars or config maps (don’t bake secrets into images).
  • Expose only the ports you need (app port + UDP RTP range).
  • Keep logs structured; avoid debug logs at high concurrency.

Kubernetes Reality: UDP + RTP Changes the Typical Load Balancer Story

Kubernetes can absolutely run VoiceBridge at scale — but you must respect that RTP is UDP and that each call needs stability.

What Kubernetes is good at here

  • Running many replicas of the Java service
  • Restart policies and self-healing
  • Resource limits (CPU/memory) per pod
  • Horizontal Pod Autoscaling (HPA) based on CPU/metrics
  • Rolling updates (with safe draining patterns)

What Kubernetes will NOT magically solve

  • RTP cannot be “randomly load balanced” per packet across pods
  • A single call must stay pinned to one pod (session affinity at call assignment time)
  • Node-level UDP networking must be designed so PBX can reliably reach the pod’s RTP ports

Three practical Kubernetes patterns for RTP workloads

  • NodePort UDP range + explicit routing: publish a UDP range via NodePort and route PBX RTP to the correct node and pod mapping strategy. Works, but needs careful ops discipline.
  • HostNetwork pods: VoiceBridge binds directly on the node network interface; ports become node ports. Simplifies RTP reachability, but reduces density and requires careful port planning per node.
  • DaemonSet per node: one VoiceBridge instance per node, predictable port ranges per node, easier RTP planning. Scale by adding nodes.

Which one you choose depends on your infrastructure and firewall ownership — but all three share the same core rule: RTP must be deterministic and reachable.

Load Distribution: How Calls Get Assigned to Replicas

To scale VoiceBridge, you must decide how a new inbound call chooses which VoiceBridge instance will own it. There are two common production strategies:

A dispatcher (or a lightweight routing layer) assigns a call to a specific VoiceBridge replica before creating ExternalMedia. The chosen replica creates RTP endpoints and owns the session until hangup.

Why this works: it matches the session nature of: CallSession.java and the ARI media graph creation in: AriBridgeImpl.java.

Strategy B: “Static Sharding”

You shard by DID ranges, trunk groups, customer accounts, or Asterisk instances. Each shard points to a dedicated VoiceBridge deployment (or namespace).

This is operationally simple and extremely stable under load. It is often the easiest way to reach 500–2000 calls because you avoid complex dynamic routing.

Scaling Numbers: A Practical Capacity Planning Method

Avoid fake “calls-per-core” promises. Instead, capacity plan using measurable components:

  • CPU per call: includes RTP processing + codec conversion + AI session overhead
  • Memory per call: audio buffers, AI buffers, session objects, logging contexts
  • UDP ports per call: based on your duplex leg design and RtpPortAllocator behavior
  • Bandwidth per call: RTP in + RTP out + AI traffic + optional recording uploads

How to size without guessing

  • Run a load test at 20 calls and measure CPU, memory, GC pressure, packet loss and jitter.
  • Run at 50 calls and confirm linearity. (If it’s not linear, find contention: locks, logs, network, GC.)
  • Set a conservative per-instance max calls (example: 60 calls per pod).
  • Scale by adding pods for 200/300/500 calls.

This avoids the worst production failure mode: a single instance running “too many calls” and collapsing every session at once.

Operational Patterns That Keep Hundreds of Calls Stable

1) Hard caps per instance

Implement admission control: if a pod is at capacity, route the next call elsewhere. A “soft” best-effort approach leads to overload spirals.

2) Safe rolling deploys (no mid-call chaos)

For Kubernetes, do not kill pods that own live RTP sessions without draining. Production pattern:

  • Mark pod as “not accepting new calls”
  • Wait for existing calls to finish (or timeout gracefully)
  • Then terminate pod

3) Observability per pod and per call

When scaling, you must be able to answer: “Which pod owns this call?” and “Is RTP healthy?”

VoiceBridge already includes observability hooks via: src/main/java/com/mylinehub/voicebridge/metrics/VoiceBridgeMetricsService.java. Use these metrics to drive alerts and autoscaling decisions.

4) Keep firewall rules strict while scaling

Scale by adding replicas — not by widening UDP exposure. Your security posture should remain stable as concurrency grows.

Summary: The Intended Way to Scale VoiceBridge

  • VoiceBridge scales like a Java media worker pool: cap calls per instance, add replicas.
  • Scaling success depends on: UDP port planning (see RtpPortAllocator), NAT-safe endpoints (see RtpSymmetricEndpoint), and timing discipline (see RtpPacketizer).
  • Kubernetes works well if you respect RTP realities: deterministic ports, reachable networking, and call pinning to a single pod.
  • Asterisk is not the main scaling lever here. The real scaling lever is VoiceBridge compute + container orchestration.

Repo: https://github.com/mylinehub/omnichannel-crm/tree/main/mylinehub-voicebridge

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos
Tip: Comment “Try the bot” on our YouTube videos to see automation in action.
M
MYLINEHUB Team
Published: 2026-02-10
Quick feedback
Was this helpful? (Yes 0 • No 0)
Reaction

Comments (0)

Be the first to comment.