WebRTC Scaling

Scaling a Public “Talk to Bot” Button to Thousands of Concurrent Users

MYLINEHUB Team • 2026-02-23 • 18 min

Production scaling guide for WebRTC voice buttons covering DTLS cost, TURN capacity planning, horizontal scaling, and avoiding Asterisk bottlenecks.

Scaling a Public “Talk to Bot” Button to Thousands of Concurrent Users

WebRTC • Click-to-Call Button • AI Voice Bot • Janus/LiveKit/mediasoup • TURN • Asterisk/FreeSWITCH

Scaling a “Talk to Bot” WebRTC Button to Thousands of Users (Real Architecture, Costs, and Pitfalls)

A “Talk to Bot” button looks simple: user clicks, browser asks for mic permission, and suddenly they are talking to an AI voice bot. But when you scale from 10 testers to 10,000 daily users (or thousands of concurrent users), everything changes.

The biggest mistake teams make is treating WebRTC like a normal API call: “If our web server scales, the voice button will scale.” WebRTC is real-time media — latency-sensitive, bandwidth-heavy, NAT-sensitive, and it requires different scaling and monitoring than typical web apps.

What you will get from this guide:

  • A novice-friendly explanation first (layman examples), then deeper engineering detail.
  • Concrete scaling patterns: TURN, gateway/SFU sizing, region placement, and autoscaling triggers.
  • Capacity planning tables (bandwidth math, concurrency, CPU/RAM considerations).
  • Production checklists: ports, NAT, observability, load testing, and “why it works in dev but fails in prod”.
  • Where SIP servers (Asterisk/FreeSWITCH) fit, and when to introduce Janus / LiveKit / mediasoup.

1) Layman view: what “scaling voice” really means

Scaling a web page vs scaling a live call

A web page is like sending a PDF to someone — you can cache it, retry it, and deliver it eventually. A live voice call is like a walkie-talkie conversation — if packets arrive late, the conversation feels broken.

  • Web: “If it’s slow, user waits.”
  • Voice: “If it’s slow, user hangs up.”

Scaling voice is mostly about 4 things

  1. Network reachability (NAT/firewalls) → solved by TURN
  2. Bandwidth (media traffic) → planning + multi-region
  3. Real-time processing (STT/TTS/AI) → concurrency + backpressure
  4. Operational visibility → monitoring that tells you what users experience

Rule of thumb: if your click-to-call works for 95% of users, it will still feel “broken”. People remember voice failures more than slow page loads.

Your target is not “mostly works”. Your target is “works everywhere”.

2) The components of a click-to-call AI voice stack

Browser UI Signaling ICE/STUN/TURN Media Server/Gateway SIP/PBX AI Voice Pipeline Observability

A production voice button is not one server. It’s a chain. When calls fail, you must know which link failed.

Layer Example tech Why it exists Scaling concern
Browser UI JS + WebRTC APIs Mic capture, UX, permissions Device diversity, permission flows
Signaling HTTPS/WSS Exchange SDP and ICE candidates State, auth, retries
NAT traversal STUN/TURN (coturn) Reachability across networks Bandwidth + regional placement
Media gateway/SFU Janus / LiveKit / mediasoup Anchor media; bridge to SIP or AI pipeline CPU, memory, packet pacing
SIP/PBX (optional) Asterisk / FreeSWITCH Enterprise calling, extensions, PSTN RTP ports, transcoding cost
AI voice STT + LLM + TTS Understand and respond Latency budgets, concurrency limits
Observability Metrics/logs/traces + QoE Know why calls fail Cardinality, sampling

Scaling tip: split “control plane” from “media plane”.

  • Control plane = signaling, auth, session routing (scales like web)
  • Media plane = RTP/SRTP packets, TURN relay traffic (scales like networking)

3) Topologies: P2P, Gateway (Janus), SFU (LiveKit/mediasoup)

Browser sends media directly to another peer. Great for two humans on good networks. Bad for “browser ↔ bot” because your bot is not a browser and you still need an anchor server.

  • Cons: NAT complexity, hard to integrate SIP, hard to measure QoE
  • Pros: low infra cost when it works

3.2 Gateway model (Janus as anchor)

Browser connects to Janus. Janus terminates WebRTC and provides plugin-based routing: SIP plugin to Asterisk, or custom logic to an AI pipeline.

  • Pros: simple mental model, great SIP bridge story
  • Cons: you must size gateways and TURN; multi-region planning

3.3 SFU model (LiveKit/mediasoup)

Browser connects to an SFU. SFU optimizes multi-party calls and routing. For a bot button, you’re often still doing 1:1, but SFUs offer strong tooling and scaling patterns.

  • Pros: mature scaling patterns, multi-party ready
  • Cons: SIP bridging can be extra work; complexity may be higher
Practical scaling: anchor media on a server + rely on TURN as fallback Browser Mic + WebRTC TURN Relay when needed UDP/TCP/TLS 443 Media Anchor Janus / LiveKit / mediasoup SRTP ↔ RTP, routing AI + PBX STT/TTS + Asterisk/FS

Practical recommendation for “Talk to Bot”:

  • Use an anchor server (Janus or SFU) rather than “direct to bot”.
  • Use TURN as a required fallback path (and size it).
  • Split control-plane vs media-plane scaling.

4) TURN at scale: the reliability layer you must plan for

At small scale, you might get away with only STUN — most tests happen on permissive networks. At real scale, your user base includes corporate firewalls, carrier NAT, and “weird routers”. TURN is what converts “sometimes works” into “works reliably”.

4.1 TURN is not just “one server”

  • Placement matters: TURN far away increases latency and jitter.
  • Bandwidth matters: TURN relays media — it carries your traffic.
  • Protocol coverage matters:
    • TURN/UDP (best)
    • TURN/TCP (fallback)
    • TURN/TLS on 443 (enterprise networks)

4.2 TURN usage rate (why you must measure it)

Not all users will use TURN. ICE tries direct (host/srflx) first. A healthy system will have a mix:

  • Some sessions: direct (srflx)
  • Some sessions: relay (TURN)

Your cost depends on % relay. Without monitoring, you are flying blind.

Decision Option Impact When to choose
TURN policy TURN as fallback Lower cost, slightly more failure risk if direct checks lie When you have good monitoring and can iterate
TURN policy Force relay (iceTransportPolicy=relay) Highest reliability, higher cost, consistent behavior When you must be “enterprise reliable” or debugging
TURN geography Single region Simple, higher latency for distant users Pilot / local audience
TURN geography Multi-region Lower latency globally, operational complexity Global users, higher concurrency

Minimum viable production TURN:

  • At least 2 TURN nodes (HA), ideally behind DNS-based routing.
  • TURN/UDP + TURN/TLS 443 enabled.
  • Relay port range opened on firewall.
  • Ephemeral credentials issued by backend (no static creds in JS).

Related reading: Why WebRTC Calls Fail Behind NAT (and How TURN Fixes It)

5) Capacity planning: bandwidth, concurrency, CPU, cost

“Thousands of users” can mean: thousands of daily visitors (low concurrency) or thousands of concurrent calls (high concurrency). Concurrency is what drives infra sizing.

5.1 Define your traffic reality

Metric Example Why it matters
Daily unique visitors 50,000/day Controls auth/signaling scale
Click-to-call conversion 2% How many sessions become calls
Average call duration 3 minutes Controls concurrent load
Peak factor 5× average Determines worst-case sizing
% calls using TURN 30–70% Determines TURN bandwidth cost

Without these numbers, you can’t size TURN/media servers correctly.

5.2 Bandwidth is the first hard limit

Even “audio only” has overhead. Real-world effective bitrate depends on codec, packetization, and network. Use a safe range.

Audio mode Rough per-direction budget Duplex (both ways) Notes
Opus voice (typical) 30–80 kbps 60–160 kbps Depends on ptime, overhead, network
G.711 (PCMU/PCMA) 80–110 kbps 160–220 kbps Common when bridging to SIP/PBX

TURN relays the traffic, so TURN bandwidth is roughly proportional to concurrent calls × duplex bitrate × % relay.

5.3 Simple concurrency math (use your numbers)

A practical approximation: concurrent_calls ≈ (calls_per_hour × avg_duration_minutes) / 60 then multiply by a peak factor.

# Example (replace with your real metrics)
visitors_per_day = 50000
conversion = 0.02                 # 2% click-to-call
calls_per_day = visitors_per_day * conversion   # 1000 calls/day

# If most calls happen in 8 busy hours
calls_per_hour = calls_per_day / 8              # 125 calls/hour

avg_duration_min = 3
concurrent_avg = (calls_per_hour * avg_duration_min) / 60  # 6.25

peak_factor = 5
concurrent_peak = concurrent_avg * peak_factor  # ~31 calls peak

This is why “thousands of daily users” might still mean only dozens of concurrent calls. But if you are a call center entry point, concurrency can be hundreds or thousands.

5.4 TURN bandwidth estimate table

Concurrent calls Duplex bitrate (kbps) % relay Approx TURN egress (Mbps) Interpretation
50 160 50% ~4 Mbps Small production, single node possible
500 160 60% ~48 Mbps Need multiple TURN nodes + monitoring
2,000 160 70% ~224 Mbps Multi-region TURN + capacity engineering

Estimate: egress_mbps ≈ concurrent_calls × duplex_kbps × relay_fraction / 1000. This is approximate; plan safety margin.

The “hidden cost” at scale: TURN bandwidth.

If you run TURN in cloud environments with egress billing, your cost scales with usage. That’s why measuring % relay and placing TURN near users matters.

6) Multi-region design: where to place TURN + media servers

In voice, distance is delay. Delay becomes conversational friction. Multi-region is how you keep latency low for global users.

6.1 Basic placement rule

  • Place TURN close to users (or at least not extremely far).
  • Place media anchor (Janus/SFU) close to users too.
  • Place AI pipeline where it can respond quickly (or use regional edges).

If your TURN is in one country and your users are worldwide, even successful calls can feel “laggy”.

6.2 Common regional patterns

Pattern What it looks like When it works Where it fails
Single region 1 TURN + 1 media cluster Local users; pilot Global latency, single failure domain
2 regions EU + US (or India + SG) Two large user clusters Users far from both regions
Multi-region Several TURN/media edges Global user base More ops complexity

Routing approach (simple): pick nearest region in the control plane.

  • When user loads the page, your backend chooses a region (GeoIP or latency probe).
  • You return region-specific iceServers (TURN URLs) and a region-specific signaling endpoint.
  • The call stays in-region as much as possible.

This reduces round-trip time for ICE checks and for media.

7) Bridging to SIP: Asterisk/FreeSWITCH patterns that scale

Many “Talk to Bot” buttons exist because you already have a SIP/PBX world: extensions, IVRs, queues, recordings, compliance, or PSTN. WebRTC becomes a new endpoint type, and you need a bridge.

7.1 The scalable pattern: WebRTC → Gateway → SIP

  • Browser uses WebRTC (SRTP + ICE)
  • Gateway (Janus) terminates SRTP and speaks SIP/RTP to PBX
  • PBX handles dialplan, queues, recording, etc.

Janus is popular here because the SIP plugin fits the bridge model well.

7.2 Where scaling pain appears

  • Transcoding: if codecs don’t match, CPU cost rises.
  • RTP port management: PBX needs correctly opened RTP ranges.
  • NAT: PBX NAT config differs from WebRTC ICE config.
  • Recording: recording every call increases I/O and storage.
Choice Good default Why Scaling impact
WebRTC codec Opus Great quality at low bitrate May require transcoding to/from SIP codec
SIP codec PCMU/PCMA (G.711) Very common in PBX/PSTN land Higher bitrate; transcoding if mismatch
Scaling strategy Avoid transcoding when possible CPU saver May constrain codec selection

Scaling tip: decide where transcoding happens (and pay the cost once).

  • If Janus transcodes, size Janus for CPU.
  • If PBX transcodes, size PBX for CPU.
  • If AI pipeline requires PCM16 internally, plan conversion at the edge of that pipeline.

Related: Connect Janus to Asterisk (Extension 7000) Using SIP + ARI

8) AI pipeline scaling: STT/TTS, barge-in, jitter buffers

Once you can carry audio reliably, the next bottleneck is the AI pipeline. Users don’t judge your system by “it connected” — they judge by how fast and natural the bot responds.

8.1 Latency budget (practical)

A human conversation feels good when responses start quickly. A practical target is < 500–800 ms from end-of-user-sentence to start-of-bot-speech, though it depends on use case and language.

Stage What happens Latency risk
Capture + jitter buffer Browser/gateway smooths packet variation Too large buffer = slow feel
STT Speech → text (streaming) Model speed + network
LLM Reasoning + response planning Token generation speed
TTS Text → speech Voice quality vs speed tradeoff
Playback Send audio back to user Packet pacing; drift

8.2 Concurrency strategy

  • Streaming STT reduces perceived latency (partial transcripts).
  • Chunked TTS lets bot start speaking before full sentence is generated.
  • Barge-in requires you to stop TTS when user starts speaking.
  • Backpressure is essential: when AI is saturated, degrade gracefully.

At scale, your AI vendors/services have rate limits. You must build queueing and fallback logic.

Common scaling failure: media scales, AI doesn’t.

Your TURN and gateway might handle 1,000 concurrent calls, but your STT/TTS capacity might only handle 100. The user experience becomes: “Call connects, bot is slow or silent.”

Treat AI capacity like a core production resource with autoscaling and load shedding.

9) Observability: what to monitor (and why)

Scaling is not only adding servers — it’s knowing what is breaking before users complain. WebRTC gives you excellent client-side stats. Use them.

9.1 Must-have client QoE metrics

Metric What it tells you Why it matters at scale
Selected candidate type host / srflx / relay % relay drives cost and indicates NAT pain
Packet loss Network quality Predicts user complaints
Jitter Variation in packet arrival Predicts choppy audio
RTT Round-trip time Predicts conversational delay
Audio levels Is mic working? is playback working? Detects permission/device issues

Collect per-call summaries (not every sample) to avoid huge data volume.

9.2 Must-have server metrics (TURN + gateway)

  • TURN allocations per minute, active allocations
  • TURN bandwidth in/out, per-region
  • Gateway sessions, CPU, memory, packet drops
  • ICE success rate, time to connect
  • Call setup failure reasons (auth, SDP, ICE, DTLS, media)

At scale, you want dashboards that answer: “Is it our network, our gateway, our AI, or the user’s device?”

Practical logging tip: attach a single call_id everywhere.

  • Browser logs include call_id.
  • Signaling logs include call_id.
  • TURN/gateway session includes call_id (or mapping token).
  • AI pipeline logs include call_id.

When something fails, you can trace it end-to-end in minutes instead of guessing.

10) Load testing WebRTC properly (not just HTTP)

A common trap: “We load-tested our API with 10,000 requests per second, so we’re ready.” That tests the signaling server. It does not test the media plane.

10.1 What you must test

  • ICE success rate under load
  • TURN allocations under load
  • Gateway CPU/memory under concurrent sessions
  • Audio QoE metrics (loss/jitter) under load
  • AI pipeline concurrency and tail latency (p95/p99)

10.2 How to test realistically

  • Use headless browsers or WebRTC test clients
  • Run tests from multiple networks/regions
  • Include TURN/TLS 443 scenarios (enterprise)
  • Include packet loss/jitter simulation (tc/netem)

Your goal is not only “connects”. Your goal is “sounds good” under load.

Most important load test output: the “long tail”.

Voice failures often appear at p95/p99 (rare but impactful). Track time_to_first_audio, ice_time, and AI response latency at p95/p99.

11) Security & abuse prevention at scale

When you open a public voice button, you invite abuse: bots, credential reuse, TURN bandwidth theft, and denial-of-service attempts. Security is part of scaling.

11.1 TURN abuse prevention

  • Use ephemeral TURN credentials (short TTL)
  • Rate-limit session creation per IP/user/token
  • Restrict TURN realms/origins and log allocations
  • Alert on unusual bandwidth spikes

11.2 Call/session abuse prevention

  • Require a signed session token before creating a call
  • CAPTCHA or proof-of-work on suspicious traffic
  • Hard caps: max call duration, max concurrent calls per account
  • Blocklists for repeated offenders

Scaling reality: without guardrails, your TURN bill can become your biggest surprise.

12) Production checklists and common failure modes

12.1 Production checklist (media plane)

Check Good looks like Common failure
TURN reachable UDP + TLS 443 both work Only UDP works; enterprise users fail
Relay port range open Packets on relay range visible 3478 open but relay ports blocked
% relay measured Dashboard shows relay usage Costs spike “mysteriously”
Gateway CPU headroom < 60% at peak High CPU causes jitter/packet drop
Multi-region routing User gets nearest region All users forced through one region

12.2 Common failure modes (symptoms → fixes)

Symptom Likely cause Fix
“Connecting…” forever No working ICE pair; TURN missing or blocked Add TURN/TLS 443; verify relay candidates exist
Connects, no audio Relay range blocked; gateway packet drop; codec mismatch Open relay range; inspect webrtc-internals; check transcoding
Audio choppy at peak Gateway CPU or NIC saturated Scale out gateway; reduce transcoding; optimize ptime
Bot replies late AI pipeline saturation Autoscale STT/TTS; add queueing + degradation
Works in dev only NAT/firewall differences in real networks Test on mobile + corp networks; TURN/TLS

Quick debugging order (fastest wins first):

  1. Check chrome://webrtc-internals selected candidate type (host/srflx/relay)
  2. Force relay for one test call (prove NAT problem vs non-NAT problem)
  3. Check TURN server logs and relay-port traffic (tcpdump)
  4. Check gateway CPU and packet loss/jitter metrics
  5. Check AI pipeline latency and rate limits

Related reading: ICE vs STUN vs TURN — Complete WebRTC Networking Guide

13) References & next reading

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos
Tip: Comment “Try the bot” on our YouTube videos to see automation in action.
M
MYLINEHUB Team
Published: 2026-02-23
Quick feedback
Was this helpful? (Yes 0 • No 0)
Reaction

Comments (0)

Be the first to comment.