Voice AI Infrastructure

Voice AI Pipeline Latency

Benchmarks run daily · Results vary by region, load, and time of day

Today's fastest measurements · Ottawa, Canada
Fastest Pipeline E2E
groq → groq → cartesia
as of today · varies by region
137ms
Fastest LLM TTFT
llama-3.1-8b via groq
235ms
Fastest STT
groq whisper turbo
122ms
Fastest TTS TTFB
deepgram aura-asteria

Benchmark Results

Best results across all regions. Run with modelping.

Ottawa, Canada · April 5, 2026 · 5 runs · modelping v0.1.0
groq/whisper-large-v3-turbo groq/llama-3.3-70b cartesia/sonic-2
STT 336ms
LLM 140ms
TTS 173ms
Total: 1644ms wall-clock
STT — Speech-to-Text
LLM — Language Model
TTS — Text-to-Speech
Model Provider TTFT P50 TTFT P95 Tok/s
kimi-k2
groq
109ms 157ms 175.3
llama-3.1-8b-instant
groq
114ms 130ms 387.0
llama-4-scout-17b
groq
166ms 275ms 234.7
llama-3.3-70b-versatile
groq
327ms 378ms 181.6
gpt-4o
openai
380ms 451ms 81.6
claude-haiku-4-5
anthropic
463ms 1700ms 68.8
gpt-4o-mini
openai
508ms 549ms 44.0
gemini-2.5-flash
google
936ms 1100ms 45.2
o3-mini
openai
956ms 1508ms 70.1
claude-sonnet-4-5
anthropic
1361ms 1792ms 31.4
Model Provider Latency P50
whisper-large-v3
groq
360ms
whisper-large-v3-turbo
groq
527ms
nova-3
deepgram
583ms
nova-2
deepgram
622ms
gpt-4o-transcribe
openai
799ms
universal-2
assemblyai
1725ms
universal-3-pro
assemblyai
2855ms
default
gladia
3452ms
Model Provider TTFB P50 Realtime Factor
aura-luna
deepgram
156ms 2.7x
sonic-2
cartesia
200ms 3.9x
sonic-english
cartesia
210ms 1.9x
aurora
lmnt
298ms 2.8x
blizzard
lmnt
310ms 3.5x
flash-v2.5
elevenlabs
374ms 11.3x
tts-1
openai
1050ms 3.6x
multilingual-v2
elevenlabs
1277ms 4.0x
tts-1-hd
openai
2010ms 2.3x
default
fish-audio
2511ms 2.9x

Submit Your Benchmarks

Run modelping and share your results. Add your region to the leaderboard.

Submit results →

Request a Benchmark

We'll benchmark your model or endpoint and send you a private report.

Request benchmark →

Pipeline Latency Matrix

Total end-to-end latency for common STT + LLM + TTS combinations. Select an LLM to recalculate.

STT \ TTS Deepgram Luna
156ms
Cartesia Sonic-2
200ms
LMNT Aurora
298ms
LMNT Blizzard
310ms
ElevenLabs Flash
374ms
OpenAI TTS-1
1050ms
Fastest:
Your stack:
modelping pipeline --stt groq/whisper-large-v3 --llm groq/kimi-k2 --tts cartesia/sonic-2

What we're building

Done
LLM benchmarking — TTFT, throughput, cost
STT benchmarking — transcription latency
TTS benchmarking — time to first audio byte
STT+LLM+TTS pipeline benchmark
Community leaderboard
Coming
Network routing latency
Concurrency benchmarks
Hardware inference benchmarks

Supported Providers

Missing a provider? Request it →

LLM
OpenAI
Anthropic
Google
Groq
Fireworks
Together AI
Mistral
Cohere
STT
Groq Whisper
OpenAI Whisper
Deepgram
AssemblyAI
Gladia
TTS
ElevenLabs
Cartesia
Fish Audio
PlayHT
Deepgram Aura
LMNT