Last active
December 11, 2024 22:32
-
-
Save GitterDoneScott/f41c97b46fe72769a26ce0b9cfda02b6 to your computer and use it in GitHub Desktop.
containerized yt-dlp command to download and clean Youtube transcripts and output the results to stdout. The purpose of this gist is to use the resulting output for text generation AI processes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docker run --rm ghcr.io/jauderho/yt-dlp:latest --quiet --no-warnings --ignore-config --skip-download --write-subs --write-auto-subs --sub-lang en --sub-format ttml --convert-subs srt --exec before_dl:"echo %(title)q" --exec before_dl:"cat %(requested_subtitles.:.filepath)#q | sed '/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] --> [0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9]$/d' | sed '/^[[:digit:]]\{1,3\}$/d' | sed 's/<[^>]*>//g' | sed '/^[[:space:]]*$/d' " <YouTube video URL> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment