Benchmarks run daily · Results vary by region, load, and time of day
Best results across all regions. Run with modelping.
| Model | Provider | TTFT P50 | TTFT P95 | Tok/s |
|---|---|---|---|---|
| kimi-k2 | groq |
109ms | 157ms | 175.3 |
| llama-3.1-8b-instant | groq |
114ms | 130ms | 387.0 |
| llama-4-scout-17b | groq |
166ms | 275ms | 234.7 |
| llama-3.3-70b-versatile | groq |
327ms | 378ms | 181.6 |
| gpt-4o | openai |
380ms | 451ms | 81.6 |
| claude-haiku-4-5 | anthropic |
463ms | 1700ms | 68.8 |
| gpt-4o-mini | openai |
508ms | 549ms | 44.0 |
| gemini-2.5-flash | google |
936ms | 1100ms | 45.2 |
| o3-mini | openai |
956ms | 1508ms | 70.1 |
| claude-sonnet-4-5 | anthropic |
1361ms | 1792ms | 31.4 |
| Model | Provider | Latency P50 |
|---|---|---|
| whisper-large-v3 | groq |
360ms |
| whisper-large-v3-turbo | groq |
527ms |
| nova-3 | deepgram |
583ms |
| nova-2 | deepgram |
622ms |
| gpt-4o-transcribe | openai |
799ms |
| universal-2 | assemblyai |
1725ms |
| universal-3-pro | assemblyai |
2855ms |
| default | gladia |
3452ms |
| Model | Provider | TTFB P50 | Realtime Factor |
|---|---|---|---|
| aura-luna | deepgram |
156ms | 2.7x |
| sonic-2 | cartesia |
200ms | 3.9x |
| sonic-english | cartesia |
210ms | 1.9x |
| aurora | lmnt |
298ms | 2.8x |
| blizzard | lmnt |
310ms | 3.5x |
| flash-v2.5 | elevenlabs |
374ms | 11.3x |
| tts-1 | openai |
1050ms | 3.6x |
| multilingual-v2 | elevenlabs |
1277ms | 4.0x |
| tts-1-hd | openai |
2010ms | 2.3x |
| default | fish-audio |
2511ms | 2.9x |
Run modelping and share your results. Add your region to the leaderboard.
We'll benchmark your model or endpoint and send you a private report.
Total end-to-end latency for common STT + LLM + TTS combinations. Select an LLM to recalculate.
| STT \ TTS | Deepgram Luna 156ms |
Cartesia Sonic-2 200ms |
LMNT Aurora 298ms |
LMNT Blizzard 310ms |
ElevenLabs Flash 374ms |
OpenAI TTS-1 1050ms |
|---|
Missing a provider? Request it →