WebRTC Scaling

Scaling a Public “Talk to Bot” Button to Thousands of Concurrent Users

MYLINEHUB Team • 2026-02-23 • 18 min

Production scaling guide for WebRTC voice buttons covering DTLS cost, TURN capacity planning, horizontal scaling, and avoiding Asterisk bottlenecks.

Scaling a Public “Talk to Bot” Button to Thousands of Concurrent Users

WebRTC • Click-to-Call Button • AI Voice Bot • Janus/LiveKit/mediasoup • TURN • Asterisk/FreeSWITCH

Scaling a “Talk to Bot” WebRTC Button to Thousands of Users (Real Architecture, Costs, and Pitfalls)

A “Talk to Bot” button looks simple: user clicks, browser asks for mic permission, and suddenly they are talking to an AI voice bot. But when you scale from 10 testers to 10,000 daily users (or thousands of concurrent users), everything changes.

The biggest mistake teams make is treating WebRTC like a normal API call: “If our web server scales, the voice button will scale.” WebRTC is real-time media — latency-sensitive, bandwidth-heavy, NAT-sensitive, and it requires different scaling and monitoring than typical web apps.

What you will get from this guide:

A novice-friendly explanation first (layman examples), then deeper engineering detail.
Concrete scaling patterns: TURN, gateway/SFU sizing, region placement, and autoscaling triggers.
Capacity planning tables (bandwidth math, concurrency, CPU/RAM considerations).
Production checklists: ports, NAT, observability, load testing, and “why it works in dev but fails in prod”.
Where SIP servers (Asterisk/FreeSWITCH) fit, and when to introduce Janus / LiveKit / mediasoup.

1) Layman view: what “scaling voice” really means 2) The components of a click-to-call AI voice stack 3) Topologies: P2P, Gateway (Janus), SFU (LiveKit/mediasoup) 4) TURN at scale: the reliability layer you must plan for 5) Capacity planning: bandwidth, concurrency, CPU, cost 6) Multi-region design: where to place TURN + media servers 7) Bridging to SIP: Asterisk/FreeSWITCH patterns that scale 8) AI pipeline scaling: STT/TTS, barge-in, jitter buffers 9) Observability: what to monitor (and why) 10) Load testing WebRTC properly (not just HTTP) 11) Security & abuse prevention at scale 12) Production checklists and common failure modes 13) References & next reading

Quick links (required format)

This article focuses on scaling. Networking/TURN and gateway choices are referenced above.

1) Layman view: what “scaling voice” really means

Scaling a web page vs scaling a live call

A web page is like sending a PDF to someone — you can cache it, retry it, and deliver it eventually. A live voice call is like a walkie-talkie conversation — if packets arrive late, the conversation feels broken.

Web: “If it’s slow, user waits.”
Voice: “If it’s slow, user hangs up.”

Scaling voice is mostly about 4 things

Network reachability (NAT/firewalls) → solved by TURN
Bandwidth (media traffic) → planning + multi-region
Real-time processing (STT/TTS/AI) → concurrency + backpressure
Operational visibility → monitoring that tells you what users experience

Rule of thumb: if your click-to-call works for 95% of users, it will still feel “broken”. People remember voice failures more than slow page loads.

Your target is not “mostly works”. Your target is “works everywhere”.

2) The components of a click-to-call AI voice stack

Browser UI Signaling ICE/STUN/TURN Media Server/Gateway SIP/PBX AI Voice Pipeline Observability

A production voice button is not one server. It’s a chain. When calls fail, you must know which link failed.

Layer	Example tech	Why it exists	Scaling concern
Browser UI	JS + WebRTC APIs	Mic capture, UX, permissions	Device diversity, permission flows
Signaling	HTTPS/WSS	Exchange SDP and ICE candidates	State, auth, retries
NAT traversal	STUN/TURN (coturn)	Reachability across networks	Bandwidth + regional placement
Media gateway/SFU	Janus / LiveKit / mediasoup	Anchor media; bridge to SIP or AI pipeline	CPU, memory, packet pacing
SIP/PBX (optional)	Asterisk / FreeSWITCH	Enterprise calling, extensions, PSTN	RTP ports, transcoding cost
AI voice	STT + LLM + TTS	Understand and respond	Latency budgets, concurrency limits
Observability	Metrics/logs/traces + QoE	Know why calls fail	Cardinality, sampling

Scaling tip: split “control plane” from “media plane”.

Control plane = signaling, auth, session routing (scales like web)
Media plane = RTP/SRTP packets, TURN relay traffic (scales like networking)

3) Topologies: P2P, Gateway (Janus), SFU (LiveKit/mediasoup)

3.1 Pure P2P (not recommended for bot calls)

Browser sends media directly to another peer. Great for two humans on good networks. Bad for “browser ↔ bot” because your bot is not a browser and you still need an anchor server.

Cons: NAT complexity, hard to integrate SIP, hard to measure QoE
Pros: low infra cost when it works

3.2 Gateway model (Janus as anchor)

Browser connects to Janus. Janus terminates WebRTC and provides plugin-based routing: SIP plugin to Asterisk, or custom logic to an AI pipeline.

Pros: simple mental model, great SIP bridge story
Cons: you must size gateways and TURN; multi-region planning

3.3 SFU model (LiveKit/mediasoup)

Browser connects to an SFU. SFU optimizes multi-party calls and routing. For a bot button, you’re often still doing 1:1, but SFUs offer strong tooling and scaling patterns.

Pros: mature scaling patterns, multi-party ready
Cons: SIP bridging can be extra work; complexity may be higher

Practical recommendation for “Talk to Bot”:

Use an anchor server (Janus or SFU) rather than “direct to bot”.
Use TURN as a required fallback path (and size it).
Split control-plane vs media-plane scaling.

4) TURN at scale: the reliability layer you must plan for

At small scale, you might get away with only STUN — most tests happen on permissive networks. At real scale, your user base includes corporate firewalls, carrier NAT, and “weird routers”. TURN is what converts “sometimes works” into “works reliably”.

4.1 TURN is not just “one server”

Placement matters: TURN far away increases latency and jitter.
Bandwidth matters: TURN relays media — it carries your traffic.
Protocol coverage matters:
- TURN/UDP (best)
- TURN/TCP (fallback)
- TURN/TLS on 443 (enterprise networks)

4.2 TURN usage rate (why you must measure it)

Not all users will use TURN. ICE tries direct (host/srflx) first. A healthy system will have a mix:

Some sessions: direct (srflx)
Some sessions: relay (TURN)

Your cost depends on % relay. Without monitoring, you are flying blind.

Decision	Option	Impact	When to choose
TURN policy	TURN as fallback	Lower cost, slightly more failure risk if direct checks lie	When you have good monitoring and can iterate
TURN policy	Force relay (iceTransportPolicy=relay)	Highest reliability, higher cost, consistent behavior	When you must be “enterprise reliable” or debugging
TURN geography	Single region	Simple, higher latency for distant users	Pilot / local audience
TURN geography	Multi-region	Lower latency globally, operational complexity	Global users, higher concurrency

Minimum viable production TURN:

At least 2 TURN nodes (HA), ideally behind DNS-based routing.
TURN/UDP + TURN/TLS 443 enabled.
Relay port range opened on firewall.
Ephemeral credentials issued by backend (no static creds in JS).

5) Capacity planning: bandwidth, concurrency, CPU, cost

“Thousands of users” can mean: thousands of daily visitors (low concurrency) or thousands of concurrent calls (high concurrency). Concurrency is what drives infra sizing.

5.1 Define your traffic reality

Metric	Example	Why it matters
Daily unique visitors	50,000/day	Controls auth/signaling scale
Click-to-call conversion	2%	How many sessions become calls
Average call duration	3 minutes	Controls concurrent load
Peak factor	5× average	Determines worst-case sizing
% calls using TURN	30–70%	Determines TURN bandwidth cost

Without these numbers, you can’t size TURN/media servers correctly.

5.2 Bandwidth is the first hard limit

Even “audio only” has overhead. Real-world effective bitrate depends on codec, packetization, and network. Use a safe range.

Audio mode	Rough per-direction budget	Duplex (both ways)	Notes
Opus voice (typical)	30–80 kbps	60–160 kbps	Depends on ptime, overhead, network
G.711 (PCMU/PCMA)	80–110 kbps	160–220 kbps	Common when bridging to SIP/PBX

TURN relays the traffic, so TURN bandwidth is roughly proportional to concurrent calls × duplex bitrate × % relay.

5.3 Simple concurrency math (use your numbers)

A practical approximation: concurrent_calls ≈ (calls_per_hour × avg_duration_minutes) / 60 then multiply by a peak factor.

# Example (replace with your real metrics)
visitors_per_day = 50000
conversion = 0.02                 # 2% click-to-call
calls_per_day = visitors_per_day * conversion   # 1000 calls/day

# If most calls happen in 8 busy hours
calls_per_hour = calls_per_day / 8              # 125 calls/hour

avg_duration_min = 3
concurrent_avg = (calls_per_hour * avg_duration_min) / 60  # 6.25

peak_factor = 5
concurrent_peak = concurrent_avg * peak_factor  # ~31 calls peak

This is why “thousands of daily users” might still mean only dozens of concurrent calls. But if you are a call center entry point, concurrency can be hundreds or thousands.

5.4 TURN bandwidth estimate table

Concurrent calls	Duplex bitrate (kbps)	% relay	Approx TURN egress (Mbps)	Interpretation
50	160	50%	~4 Mbps	Small production, single node possible
500	160	60%	~48 Mbps	Need multiple TURN nodes + monitoring
2,000	160	70%	~224 Mbps	Multi-region TURN + capacity engineering

Estimate: egress_mbps ≈ concurrent_calls × duplex_kbps × relay_fraction / 1000. This is approximate; plan safety margin.

The “hidden cost” at scale: TURN bandwidth.

If you run TURN in cloud environments with egress billing, your cost scales with usage. That’s why measuring % relay and placing TURN near users matters.

6) Multi-region design: where to place TURN + media servers

In voice, distance is delay. Delay becomes conversational friction. Multi-region is how you keep latency low for global users.

6.1 Basic placement rule

Place TURN close to users (or at least not extremely far).
Place media anchor (Janus/SFU) close to users too.
Place AI pipeline where it can respond quickly (or use regional edges).

If your TURN is in one country and your users are worldwide, even successful calls can feel “laggy”.

6.2 Common regional patterns

Pattern	What it looks like	When it works	Where it fails
Single region	1 TURN + 1 media cluster	Local users; pilot	Global latency, single failure domain
2 regions	EU + US (or India + SG)	Two large user clusters	Users far from both regions
Multi-region	Several TURN/media edges	Global user base	More ops complexity

Routing approach (simple): pick nearest region in the control plane.

When user loads the page, your backend chooses a region (GeoIP or latency probe).
You return region-specific iceServers (TURN URLs) and a region-specific signaling endpoint.
The call stays in-region as much as possible.

This reduces round-trip time for ICE checks and for media.

7) Bridging to SIP: Asterisk/FreeSWITCH patterns that scale

Many “Talk to Bot” buttons exist because you already have a SIP/PBX world: extensions, IVRs, queues, recordings, compliance, or PSTN. WebRTC becomes a new endpoint type, and you need a bridge.

7.1 The scalable pattern: WebRTC → Gateway → SIP

Browser uses WebRTC (SRTP + ICE)
Gateway (Janus) terminates SRTP and speaks SIP/RTP to PBX
PBX handles dialplan, queues, recording, etc.

Janus is popular here because the SIP plugin fits the bridge model well.

7.2 Where scaling pain appears

Transcoding: if codecs don’t match, CPU cost rises.
RTP port management: PBX needs correctly opened RTP ranges.
NAT: PBX NAT config differs from WebRTC ICE config.
Recording: recording every call increases I/O and storage.

Choice	Good default	Why	Scaling impact
WebRTC codec	Opus	Great quality at low bitrate	May require transcoding to/from SIP codec
SIP codec	PCMU/PCMA (G.711)	Very common in PBX/PSTN land	Higher bitrate; transcoding if mismatch
Scaling strategy	Avoid transcoding when possible	CPU saver	May constrain codec selection

Scaling tip: decide where transcoding happens (and pay the cost once).

If Janus transcodes, size Janus for CPU.
If PBX transcodes, size PBX for CPU.
If AI pipeline requires PCM16 internally, plan conversion at the edge of that pipeline.

8) AI pipeline scaling: STT/TTS, barge-in, jitter buffers

Once you can carry audio reliably, the next bottleneck is the AI pipeline. Users don’t judge your system by “it connected” — they judge by how fast and natural the bot responds.

8.1 Latency budget (practical)

A human conversation feels good when responses start quickly. A practical target is < 500–800 ms from end-of-user-sentence to start-of-bot-speech, though it depends on use case and language.

Stage	What happens	Latency risk
Capture + jitter buffer	Browser/gateway smooths packet variation	Too large buffer = slow feel
STT	Speech → text (streaming)	Model speed + network
LLM	Reasoning + response planning	Token generation speed
TTS	Text → speech	Voice quality vs speed tradeoff
Playback	Send audio back to user	Packet pacing; drift

8.2 Concurrency strategy

Streaming STT reduces perceived latency (partial transcripts).
Chunked TTS lets bot start speaking before full sentence is generated.
Barge-in requires you to stop TTS when user starts speaking.
Backpressure is essential: when AI is saturated, degrade gracefully.

At scale, your AI vendors/services have rate limits. You must build queueing and fallback logic.

Common scaling failure: media scales, AI doesn’t.

Your TURN and gateway might handle 1,000 concurrent calls, but your STT/TTS capacity might only handle 100. The user experience becomes: “Call connects, bot is slow or silent.”

Treat AI capacity like a core production resource with autoscaling and load shedding.

9) Observability: what to monitor (and why)

Scaling is not only adding servers — it’s knowing what is breaking before users complain. WebRTC gives you excellent client-side stats. Use them.

9.1 Must-have client QoE metrics

Metric	What it tells you	Why it matters at scale
Selected candidate type	host / srflx / relay	% relay drives cost and indicates NAT pain
Packet loss	Network quality	Predicts user complaints
Jitter	Variation in packet arrival	Predicts choppy audio
RTT	Round-trip time	Predicts conversational delay
Audio levels	Is mic working? is playback working?	Detects permission/device issues

Collect per-call summaries (not every sample) to avoid huge data volume.

9.2 Must-have server metrics (TURN + gateway)

TURN allocations per minute, active allocations
TURN bandwidth in/out, per-region
Gateway sessions, CPU, memory, packet drops
ICE success rate, time to connect
Call setup failure reasons (auth, SDP, ICE, DTLS, media)

At scale, you want dashboards that answer: “Is it our network, our gateway, our AI, or the user’s device?”

Practical logging tip: attach a single call_id everywhere.

Browser logs include call_id.
Signaling logs include call_id.
TURN/gateway session includes call_id (or mapping token).
AI pipeline logs include call_id.

When something fails, you can trace it end-to-end in minutes instead of guessing.

10) Load testing WebRTC properly (not just HTTP)

A common trap: “We load-tested our API with 10,000 requests per second, so we’re ready.” That tests the signaling server. It does not test the media plane.

10.1 What you must test

ICE success rate under load
TURN allocations under load
Gateway CPU/memory under concurrent sessions
Audio QoE metrics (loss/jitter) under load
AI pipeline concurrency and tail latency (p95/p99)

10.2 How to test realistically

Use headless browsers or WebRTC test clients
Run tests from multiple networks/regions
Include TURN/TLS 443 scenarios (enterprise)
Include packet loss/jitter simulation (tc/netem)

Your goal is not only “connects”. Your goal is “sounds good” under load.

Most important load test output: the “long tail”.

Voice failures often appear at p95/p99 (rare but impactful). Track time_to_first_audio, ice_time, and AI response latency at p95/p99.

11) Security & abuse prevention at scale

When you open a public voice button, you invite abuse: bots, credential reuse, TURN bandwidth theft, and denial-of-service attempts. Security is part of scaling.

11.1 TURN abuse prevention

Use ephemeral TURN credentials (short TTL)
Rate-limit session creation per IP/user/token
Restrict TURN realms/origins and log allocations
Alert on unusual bandwidth spikes

11.2 Call/session abuse prevention

Require a signed session token before creating a call
CAPTCHA or proof-of-work on suspicious traffic
Hard caps: max call duration, max concurrent calls per account
Blocklists for repeated offenders

Scaling reality: without guardrails, your TURN bill can become your biggest surprise.

12) Production checklists and common failure modes

12.1 Production checklist (media plane)

Check	Good looks like	Common failure
TURN reachable	UDP + TLS 443 both work	Only UDP works; enterprise users fail
Relay port range open	Packets on relay range visible	3478 open but relay ports blocked
% relay measured	Dashboard shows relay usage	Costs spike “mysteriously”
Gateway CPU headroom	< 60% at peak	High CPU causes jitter/packet drop
Multi-region routing	User gets nearest region	All users forced through one region

12.2 Common failure modes (symptoms → fixes)

Symptom	Likely cause	Fix
“Connecting…” forever	No working ICE pair; TURN missing or blocked	Add TURN/TLS 443; verify relay candidates exist
Connects, no audio	Relay range blocked; gateway packet drop; codec mismatch	Open relay range; inspect webrtc-internals; check transcoding
Audio choppy at peak	Gateway CPU or NIC saturated	Scale out gateway; reduce transcoding; optimize ptime
Bot replies late	AI pipeline saturation	Autoscale STT/TTS; add queueing + degradation
Works in dev only	NAT/firewall differences in real networks	Test on mobile + corp networks; TURN/TLS

Quick debugging order (fastest wins first):

Check chrome://webrtc-internals selected candidate type (host/srflx/relay)
Force relay for one test call (prove NAT problem vs non-NAT problem)
Check TURN server logs and relay-port traffic (tcpdump)
Check gateway CPU and packet loss/jitter metrics
Check AI pipeline latency and rate limits

13) References & next reading

Internal (MYLINEHUB) articles

External reference (standards)

TURN standard: rfc-editor.org/rfc/rfc5766

Standards references help you validate behavior across vendors and implementations.

Try it

Want to see API-driven CRM + Telecom workflows in action? Try the WhatsApp bot or explore the demos.

💬 Try WhatsApp Bot ▶️ Watch CRM YouTube Demos

Tip: Comment “Try the bot” on our YouTube videos to see automation in action.

MYLINEHUB Team

Published: 2026-02-23

Quick feedback

Was this helpful? (Yes 0 • No 0)

Reaction

Comments (0)

Be the first to comment.