Skip to content

Instantly share code, notes, and snippets.

@GitterDoneScott
Last active December 11, 2024 22:32
Show Gist options
  • Save GitterDoneScott/f41c97b46fe72769a26ce0b9cfda02b6 to your computer and use it in GitHub Desktop.
Save GitterDoneScott/f41c97b46fe72769a26ce0b9cfda02b6 to your computer and use it in GitHub Desktop.
containerized yt-dlp command to download and clean Youtube transcripts and output the results to stdout. The purpose of this gist is to use the resulting output for text generation AI processes.
docker run --rm ghcr.io/jauderho/yt-dlp:latest --quiet --no-warnings --ignore-config --skip-download --write-subs --write-auto-subs --sub-lang en --sub-format ttml --convert-subs srt --exec before_dl:"echo %(title)q" --exec before_dl:"cat %(requested_subtitles.:.filepath)#q | sed '/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] --> [0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9]$/d' | sed '/^[[:digit:]]\{1,3\}$/d' | sed 's/<[^>]*>//g' | sed '/^[[:space:]]*$/d' " <YouTube video URL>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment