Whisper Large V3 Turbo vs V3: Speed, Accuracy, and Local Use
See how large-v3-turbo reduces decoder depth for faster transcription, where quality can differ, and how to benchmark both models on Mac, iPhone, and iPad.
Updated

Key takeaways
- Turbo is an optimized large-v3 model with about 809M parameters and a much shallower decoder.
- Speed gains do not guarantee identical quality across every language and audio condition.
- Use turbo as a practical default candidate and compare large-v3 on genuinely difficult samples.
What changed in large-v3-turbo
Large-v3-turbo is trained from large-v3 and reduces the decoder from 32 layers to 4. OpenAI lists roughly 809M parameters versus about 1550M for the large family. The encoder still performs most acoustic representation work, while the shorter decoder reduces token-generation cost and improves throughput with a limited quality trade-off.
Measure speed under reproducible conditions
Use the same hardware, runtime, precision, and decoding parameters. Separate cold start, model load, first segment, and total time, and report the median of several runs. State recording length when quoting a real-time factor: processing 60 minutes in 12 minutes is 5× real time. Short clips mostly measure loading; long clips reveal sustained throughput.
Similar averages can hide different errors
Turbo can remain close to large-v3 on many benchmarks while differing by language, accent, noise, and segmentation. Track WER or CER plus names, numbers, negations, repeated phrases, and hallucinations during silence or music. A faster model is not faster overall if its output demands substantially more correction.
Choose between turbo and large-v3 on Mac and iPhone
Turbo is a strong starting point for meetings, interviews, and batches on Mac. Re-run difficult recordings with large-v3 when memory and time allow. Mobile devices face tighter memory, heat, battery, and background limits, so turbo or a smaller model is often more practical. Preserve model revision and quantization with each transcript.
A practical default-model rule
Transcribe a representative sample with turbo first. Keep it as the default when critical entities are correct, hallucinations are controlled, and runtime meets the target. Compare large-v3 only on the segments where turbo consistently fails. Re-run a fixed regression set after every model or runtime upgrade rather than relying on a vendor’s best number.
Frequently asked questions
What is the main difference between large-v3-turbo and large-v3?
Turbo reduces the large-v3 decoder from 32 layers to 4, lowering parameter count and increasing speed. Accuracy differences depend on language and audio.
Is turbo suitable for speech translation?
OpenAI notes that turbo is not trained for translation tasks. Use a multilingual model that supports translation and validate the output.
Should iPhone always use the largest Whisper model?
No. Mobile devices must balance memory, battery, heat, and waiting time. Test turbo or a smaller model first on representative recordings.