Skip to content

Instantly share code, notes, and snippets.

@midudev
Created October 15, 2022 17:52
Show Gist options
  • Save midudev/2bc13e6ef38ccc4716fba8b7258f1403 to your computer and use it in GitHub Desktop.
Save midudev/2bc13e6ef38ccc4716fba8b7258f1403 to your computer and use it in GitHub Desktop.
Transcribe vídeo de YouTube con Whisper e Inteligencia Artificial

Requisitos

Necesitas tener instalado Python 3.9 e instalar la dependencia de Whisper y PyTube:

pip install git+https://github.com/openai/whisper.git
pip install pytube

También necesitas tener instalado ffmpeg. Según tu sistema operativo se instala de esta forma:

# Ubuntu
sudo apt update && sudo apt install ffmpeg
# Arch Linux
sudo pacman -S ffmpeg
#  MacOS con Homebrew (https://brew.sh/)
brew install ffmpeg
# Windows con Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# Windows con Scoop (https://scoop.sh/)
scoop install ffmpeg

Cómo usar la línea de comandos

Necesitas indicar la URL del vídeo de YouTube que quieres transcribir:

python3 transcript.py -h

python3 transcript.py --video "https://www.youtube.com/watch?v=oHrjAbDanpw"

# también puedes indicar el modelo de IA que usará Whisper
# cuanto más grande, más tardará en descargarlo la primera vez
python3 transcript.py --video "https://www.youtube.com/watch?v=oHrjAbDanpw" --model "large"
import logging
import pytube
import whisper
import sys
import argparse
parser = argparse.ArgumentParser(description='Transcript a YouTube video using Whisper')
parser.add_argument("--video", help = "Pass the YouTube url to transcribe")
parser.add_argument("--model", help = "Indicate the Whisper model to download", default="small")
args = parser.parse_args()
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.StreamHandler(sys.stdout)
]
)
if not args.video:
logging.error("Please pass a YouTube url to transcribe")
exit()
logging.info("Downloading Whisper model")
model = whisper.load_model(args.model)
logging.info("Downloading the video from YouTube...")
youtubeVideo = pytube.YouTube(args.video)
logging.info("Get only the audio from the video")
audio = youtubeVideo.streams.get_audio_only()
audio.download(filename='tmp.mp4')
logging.info("Transcribe the audio")
result = model.transcribe('tmp.mp4')
print(result["text"])
@EdixonAlberto
Copy link

Pueden ejecutar el código rápidamente desde este Colab que hice: Open In Colab

@lolokino3
Copy link

python3 transcript.py --video "https://www.youtube.com/watch?v=wgOAMKX5MAE" --model "large"

@lolokino3
Copy link

@MLFDev01
Copy link

MLFDev01 commented May 1, 2023

KeyError                                  Traceback (most recent call last)
[<ipython-input-2-62cec1f2ee37>](https://localhost:8080/#) in <cell line: 37>()
     35 
     36 logging.info("Get only the audio from the video")
---> 37 audio = youtubeVideo.streams.get_audio_only()
     38 audio.download(filename='tmp.mp4')
     39 

2 frames
[/usr/local/lib/python3.10/dist-packages/pytube/__main__.py](https://localhost:8080/#) in streams(self)
    294         """
    295         self.check_availability()
--> 296         return StreamQuery(self.fmt_streams)
    297 
    298     @property

[/usr/local/lib/python3.10/dist-packages/pytube/__main__.py](https://localhost:8080/#) in fmt_streams(self)
    174         self._fmt_streams = []
    175 
--> 176         stream_manifest = extract.apply_descrambler(self.streaming_data)
    177 
    178         # If the cached js doesn't work, try fetching a new js file

[/usr/local/lib/python3.10/dist-packages/pytube/__main__.py](https://localhost:8080/#) in streaming_data(self)
    159         else:
    160             self.bypass_age_gate()
--> 161             return self.vid_info['streamingData']
    162 
    163     @property

KeyError: 'streamingData'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment