Speech-to-Text
AI Models
Transcribe audio locally with zero API costs
- Fully offline after model download
- SRT, VTT, JSON, and text output
- Translate any language to English
What You Can Do
Local transcription — Convert speech to text completely offline, no API key required
Multiple model sizes — tiny (fastest) → base → small → medium → large (most accurate)
Output formats — Plain text, SRT subtitles, VTT captions, or JSON with timestamps
Translation mode — Translate any language audio directly to English text
Wide format support — WAV, MP3, M4A, FLAC, OGG, and more
Auto model caching — Downloads models on first use, fully offline after thatTry Asking
"Transcribe this podcast.mp3 using the medium model"
"Convert this interview to SRT subtitles"
"Transcribe my voice memo and translate it to English"
"Generate VTT captions for this video's audio track"
"Use the large model for this important lecture recording"
"Get JSON output with word-level timestamps"Pro Tips
tiny = fast but rough, small = good balance, medium = professional quality, large = maximum accuracy
First run downloads the model (40MB–3GB depending on size), then fully offline
SRT/VTT formats include timestamps for subtitle syncing
Translation mode outputs English regardless of input language
JSON output includes segment-level and word-level timing data
Works completely offline after initial model download — great for privacy