podcast_reader

Transcribe podcast audio, YouTube videos, or X/Twitter videos to styled HTML transcripts. Uses whisper-ctranslate2 for audio files, youtube-transcript-api for YouTube (fetches existing captions), and yt-dlp for X/Twitter and other platforms.

Quick Start

# Transcribe from a YouTube video (uses existing captions, no download)
podcast-reader https://www.youtube.com/watch?v=VIDEO_ID "Episode Title"

# Transcribe from X/Twitter (downloads audio via yt-dlp)
podcast-reader https://x.com/user/status/123456 "Post Title"

# Transcribe from any yt-dlp-supported URL
podcast-reader https://vimeo.com/123456 "Video Title"

# Transcribe a local file
podcast-reader ~/Downloads/episode.mp3 "Episode Title"

# Specify output directory
podcast-reader --output-dir ./output https://example.com/video

Setup

Requires: Python 3.10+, uv, NVIDIA GPU (optional, falls back to CPU).

# Development
uv sync --dev

# Run directly
uv run podcast-reader <url-or-file> [title]

# Install as standalone tool
uv tool install .

For speaker diarization, set HF_TOKEN and accept model terms at:

Environment Variables

Variable	Default	Description
`WHISPER_MODEL`	`large-v3`	Whisper model size
`WHISPER_LANG`	`en`	Language code
`WHISPER_DEVICE`	`cuda`	`cuda` or `cpu`
`HF_TOKEN`	(none)	HuggingFace token for diarization
`ANTHROPIC_API_KEY`	(none)	Enables chapter generation via Claude
`SENTENCES`	`5`	Sentences per paragraph in HTML
`YT_DLP_COOKIES`	(none)	Path to cookies file for authenticated yt-dlp downloads

Package Structure

Module	Purpose
`src/podcast_reader/cli.py`	Main CLI entry point — URL routing, pipeline orchestration
`src/podcast_reader/youtube.py`	Fetch YouTube captions as whisper-compatible JSON
`src/podcast_reader/ytdlp.py`	Download audio from X/Twitter and other platforms via yt-dlp
`src/podcast_reader/transcribe.py`	Run whisper-ctranslate2 on audio files
`src/podcast_reader/chapters.py`	Generate chapter markers via Claude
`src/podcast_reader/html.py`	Convert whisper JSON to styled HTML with TOC, key points, pull quotes
`pyproject.toml`	Dependencies, entry point, tool configuration

Pipeline

YouTube URL → youtube.py fetches captions → whisper JSON
Other URL → ytdlp.py downloads audio → transcribe.py runs whisper → whisper JSON
Local file → transcribe.py runs whisper → whisper JSON
chapters.py → <stem>_chapters.json (if ANTHROPIC_API_KEY set)
html.py → <stem>.html (styled transcript with TOC, key points, pull quotes)

Development

Use uv for all Python package management, never raw pip.
Audio files, JSON, and HTML outputs are gitignored — they're generated artifacts.

Code Quality

# Run tests (unit only)
uv run pytest -m "not integration"

# Run all tests including integration
uv run pytest

# Type checking (strict mode)
uv run mypy src/

# Lint and format
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/

mypy: strict mode, all functions fully typed
ruff: line-length 100, rules E/F/W/I/N/UP/B/A/SIM/TCH
pytest: equality matchers preferred, subprocess mocked in unit tests, integration tests marked with @pytest.mark.integration

GitNexus — Code Intelligence

This project is indexed by GitNexus as podcast_reader (54 symbols, 111 relationships, 3 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

If any GitNexus tool warns the index is stale, run npx gitnexus analyze in terminal first.

Always Do

MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run gitnexus_impact({target: "symbolName", direction: "upstream"}) and report the blast radius (direct callers, affected processes, risk level) to the user.
MUST run gitnexus_detect_changes() before committing to verify your changes only affect expected symbols and execution flows.
MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
When exploring unfamiliar code, use gitnexus_query({query: "concept"}) to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use gitnexus_context({name: "symbolName"}).

When Debugging

gitnexus_query({query: "<error or symptom>"}) — find execution flows related to the issue
gitnexus_context({name: "<suspect function>"}) — see all callers, callees, and process participation
READ gitnexus://repo/podcast_reader/process/{processName} — trace the full execution flow step by step
For regressions: gitnexus_detect_changes({scope: "compare", base_ref: "main"}) — see what your branch changed

When Refactoring

Renaming: MUST use gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true}) first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with dry_run: false.
Extracting/Splitting: MUST run gitnexus_context({name: "target"}) to see all incoming/outgoing refs, then gitnexus_impact({target: "target", direction: "upstream"}) to find all external callers before moving code.
After any refactor: run gitnexus_detect_changes({scope: "all"}) to verify only expected files changed.

Never Do

NEVER edit a function, class, or method without first running gitnexus_impact on it.
NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
NEVER rename symbols with find-and-replace — use gitnexus_rename which understands the call graph.
NEVER commit changes without running gitnexus_detect_changes() to check affected scope.

Tools Quick Reference

Tool	When to use	Command
`query`	Find code by concept	`gitnexus_query({query: "auth validation"})`
`context`	360-degree view of one symbol	`gitnexus_context({name: "validateUser"})`
`impact`	Blast radius before editing	`gitnexus_impact({target: "X", direction: "upstream"})`
`detect_changes`	Pre-commit scope check	`gitnexus_detect_changes({scope: "staged"})`
`rename`	Safe multi-file rename	`gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})`
`cypher`	Custom graph queries	`gitnexus_cypher({query: "MATCH ..."})`

Impact Risk Levels

Depth	Meaning	Action
d=1	WILL BREAK — direct callers/importers	MUST update these
d=2	LIKELY AFFECTED — indirect deps	Should test
d=3	MAY NEED TESTING — transitive	Test if critical path

Resources

Resource	Use for
`gitnexus://repo/podcast_reader/context`	Codebase overview, check index freshness
`gitnexus://repo/podcast_reader/clusters`	All functional areas
`gitnexus://repo/podcast_reader/processes`	All execution flows
`gitnexus://repo/podcast_reader/process/{name}`	Step-by-step execution trace

Self-Check Before Finishing

Before completing any code modification task, verify:

gitnexus_impact was run for all modified symbols
No HIGH/CRITICAL risk warnings were ignored
gitnexus_detect_changes() confirms changes match expected scope
All d=1 (WILL BREAK) dependents were updated

CLI

Re-index: npx gitnexus analyze
Check freshness: npx gitnexus status
Generate docs: npx gitnexus wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podcast_reader

Quick Start

Setup

Environment Variables

Package Structure

Pipeline

Development

Code Quality

GitNexus — Code Intelligence

Always Do

When Debugging

When Refactoring

Never Do

Tools Quick Reference

Impact Risk Levels

Resources

Self-Check Before Finishing

CLI

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

podcast_reader

Quick Start

Setup

Environment Variables

Package Structure

Pipeline

Development

Code Quality

GitNexus — Code Intelligence

Always Do

When Debugging

When Refactoring

Never Do

Tools Quick Reference

Impact Risk Levels

Resources

Self-Check Before Finishing

CLI