Free Audio Translator
Translate audio from 99 languages into English text, right in your browser. Drop in a Spanish podcast, a Mandarin interview, a French lecture — get English text with timestamps. Download as .txt, .srt, or .vtt. No upload. No sign-up.
Drop an audio or video file
or click to browse. Any of 99 languages. Best with files under 30 min on most browsers. Cap at 60 min — split longer files first with our Audio Splitter.
Don't have a file? Record one with our voice recorder to test how translation works.
100% in your browser. Audio stays on your device. The Whisper AI model downloads once (~40 MB) from a public CDN, then runs locally for every translation. We can't access your audio because it never leaves your computer. Privacy policy.
Best with clear speech. Uncheck Translate to English for source-language transcription. Keep this tab open — we'll chime if you switch tabs. Models cache after first download. How models compare →
Translation
Translate audio to English — free, private, browser-based
SnipSound's Audio Translator uses OpenAI's open-source Whisper speech-translation model running entirely in your browser via WebAssembly. Upload a Spanish podcast, a Mandarin interview, a French lecture, an Arabic voice memo — Whisper renders it as English text with accurate timestamps. The first time you click Translate, your browser downloads a ~40 MB AI model from a public CDN; after that, every translation is local.
What it's great for
- English subtitles for foreign-language videos. Drop in a foreign-language clip you've downloaded, get a .srt with English subtitles you can upload to YouTube.
- English transcripts of foreign-language interviews for journalists, researchers, analysts.
- Understanding voice messages you received in a language you don't speak.
- Studying foreign-language audio for language learning — toggle between source-language and English views.
- Privacy-sensitive content — therapy notes, confidential interviews, internal meetings recorded in another language.
What it's not so good for
- Heavy background noise, music behind voice, multiple overlapping speakers.
- Low-resource languages — quality varies widely. Lao, Maori, Yiddish work but rougher than Spanish/Mandarin.
- Idiomatic / culture-specific phrasing — tiny model gives literal translations.
- Files longer than 60 minutes — capped to protect browser RAM.
- Translation INTO another non-English language — Whisper only translates TO English. Transcribe in source and use Google Translate or DeepL for second leg.
How it compares to Cockatoo, Otter, Rev
Cockatoo, Otter, Rev, Trint, Sonix all run larger Whisper variants on their servers. Quality is meaningfully higher — especially on heavy accents, multi-speaker audio, low-resource languages. They charge $10-30/month or $1/minute because GPU servers cost money. SnipSound's wedge: free, no sign-up, no upload. Use this when privacy / cost matters more than maximum accuracy.
Need transcription instead?
Uncheck Translate to English for source-language transcription, or use the dedicated Audio Transcription tool.