How to transcribe a podcast in 2026 (the free and paid options)
Transcribing a podcast in 2026 costs either nothing or three cents a minute. Here's the exact toolchain — free and paid — plus which formats make the output actually usable.
The Koydo Distill team
Updated Apr 16, 2026
TL;DR
- •Free: self-hosted Whisper. Best quality per dollar, worst UX.
- •Cheapest paid: ~$0.006/min via OpenAI's API. Solid quality, no UX.
- •Best UX: Koydo Distill or Otter. Slightly more expensive but usable immediately.
- •For study/research, skip pure transcription tools and go straight to ones that add summaries.
Podcast transcription used to be the kind of task you either did by hand (painful) or paid $1.50 a minute to a transcription service (expensive). In 2026, the same task costs between zero and three cents per minute depending on how much convenience you want. The tooling is fully commoditized. What matters now is what you do with the transcript.
This guide walks through the three viable paths — fully free, cheap paid, and premium paid — and then spends the second half on the question most transcription guides skip: how to actually turn a transcript into something you can use.
Path 1: Free (self-hosted Whisper)
OpenAI open-sourced Whisper in 2022. In 2026, Whisper-v3 and its descendants are still the accuracy leader for open-source speech recognition. Setup is straightforward if you have a Python environment:
- Install Python 3.11+, ffmpeg, and pip install faster-whisper.
- Download a podcast mp3 (yt-dlp handles most platforms, including YouTube podcasts).
- Run: from faster_whisper import WhisperModel; model = WhisperModel("large-v3"); segments, _ = model.transcribe("episode.mp3"); then iterate over segments.
- For diarization (speaker labels), pipe the output through pyannote.audio 4.0.
Total cost: zero, assuming you have a computer with a decent GPU. On a 2024 MacBook with Apple Silicon, a 90-minute podcast transcribes in about 10 minutes with faster-whisper. On an older laptop, expect 30–45 minutes.
Path 2: Cheap paid (OpenAI Whisper API)
If you don't want to fuss with local install, OpenAI's hosted Whisper endpoint charges $0.006 per minute of audio as of 2026. A 90-minute podcast: 54 cents. No GPU needed, no local disk space, just an API call.
Caveats: no diarization out of the box (you'll need to run pyannote separately if you need speaker labels), and you still need to build the frontend — the API returns a plain transcript, not a UI. Good for scripts and batch jobs, bad for one-off personal use.
Path 3: Premium paid (Distill, Otter, Rev)
$6–$17/month buys you a polished app, built-in diarization, searchable archive, and in most cases summary or study features layered on top of the raw transcript. This is where the real time savings are for most users — not the two cents per minute you'd save going the API route, but the zero minutes you spend building glue code.
Koydo Distill ($6/mo) and Otter.ai ($17/mo Pro) are the two we'd recommend for most users. Distill if you actually want to study from podcast content — medical podcasts, educational lectures, language learning. Otter if you want a pure transcript to paste into Notion and move on. Rev ($19.99/mo) is a fine third option, mostly for journalists.
Upload an mp3 or paste a YouTube URL — Distill transcribes, summarizes, and generates flashcards in one pass.
Transcribe a podcast in 2 minutes →From transcript to useful output
Here's the part most transcription guides skip. A raw transcript — even a perfectly accurate one — is barely more useful than the original audio. It's 15,000 words of unstructured speech. You can't study it, you can't search it well, you can't quote from it efficiently. Three things turn transcripts into real artifacts:
1. Diarization with named speakers
Labeling each segment with who said it. Without this, a podcast between two people reads as one continuous monologue. With it, you can quote, skim by speaker, and spot who's making which claim.
2. Chapter breakdown
A long-form podcast has 4–8 natural topic shifts. Good AI tools detect these and generate chapter markers with 5–10 word titles. Now the transcript is navigable — you can jump to the section you actually care about.
3. Summary + key quotes
A 400-word summary of the 15,000-word transcript. The best tools pair the summary with 5–10 key verbatim quotes, each linked to the source timestamp. This is what makes a transcript useful for research, reporting, or study — the ability to scan the summary, spot the interesting part, and jump to the exact source in one click.
Legality and ethics (briefly)
Two rules. First, transcribing a podcast for personal use — to study, to reference, to quote in your own writing — is clearly fair use in most jurisdictions, including the US. Second, republishing a transcript is not. Podcasters own their content; if you want to post a transcript publicly, ask.
Some podcasts release official transcripts as part of their accessibility or SEO strategy. If you're building a tool that aggregates transcripts, only use officially-released ones or get explicit permission. The AI Act in the EU and similar emerging regulations in the US will make unauthorized republishing increasingly risky.
Our pick
For personal use — studying from educational podcasts, taking notes on interviews, building a personal knowledge base — we recommend Distill. The $6/mo price reflects what you're actually getting: not the transcription (that costs pennies), but the workflow around it. Summaries, flashcards, concept maps, and search, all in one place.
For journalism or professional research where your workflow ends at the transcript, Otter. For engineering projects where you need programmatic access, the OpenAI Whisper API. For the price-sensitive engineer who enjoys tinkering, self-hosted Whisper.
Try Distill free
Turn your next lecture into study material.
Transcripts, summaries, flashcards, concept maps, and quizzes — all generated in under two minutes. 10 lectures/month free, no credit card.
Start free →Keep reading
Reviews
Best AI transcription apps for students in 2026
We tested 14 AI transcription apps on real lectures — STEM, humanities, non-native English speakers, noisy rooms. Here are the ones worth paying for in 2026.
AI Tools
The complete guide to AI lecture notes in 2026
A practical, no-hype guide to using AI for lecture notes in 2026. Whisper, GPT-5, Claude 4.5 — what works, what costs, what to trust, and what to avoid.
Languages
Learn any language by distilling native lectures
Textbooks teach you to say hello. Native lectures teach you to think in the language. Here's how to use AI to turn any native-speaker recording into a structured immersion course.