Created
June 25, 2025 22:34
-
-
Save chepo92/8a30a477c49a2197957271568a64ad82 to your computer and use it in GitHub Desktop.
transcribe-and-translate-with-openai-whisper.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
}, | |
"accelerator": "GPU" | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/chepo92/8a30a477c49a2197957271568a64ad82/transcribe-and-translate-with-openai-whisper.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Modificado por [Axel Sepulveda](https://chepo92.github.io/)\n", | |
"\n", | |
"Original By [Jason Boog](https://medium.com/@jasonboog)\n", | |
"\n", | |
"Puedes usar Whisper de forma gratuita en una GPU en la nube usando este cuaderno de Google Colab.\n", | |
"\n", | |
"[Whisper](https://openai.com/es-ES/index/whisper/) de OpenAI es un modelo de reconocimiento de voz de uso general que puedes utilizar para transcribir o traducir archivos de audio.\n", | |
"\n", | |
"Gracias a ByteXD por [esta introducción en vídeo](https://youtu.be/-KyqrwdTsN0).\n", | |
"\n", | |
"Para obtener más información sobre Whisper, puedes visitar [este repositorio de GitHub](https://github.com/openai/whisper).\n" | |
], | |
"metadata": { | |
"id": "sICPfKnZCTVZ" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Instrucciones Generales\n", | |
"\n", | |
"Antes de comenzar, guarde una copia de este cuaderno de Google Colab en su Google Drive:\n", | |
"Simplemente abra el menú Archivo y elija \"Guardar una copia en Drive\".\n", | |
"\n", | |
"Abra su copia del cuaderno de Colab en Google Chrome (o su navegador de preferencia) y siga los pasos siguientes para transcribir y traducir.\n", | |
"\n", | |
"¿Como funciona? Este codigo se puede ejecutar tanto en la nube como de forma local en tu computador, pero requieres tener una GPU (tarjeta gráfica) compatible, por lo mismo la opcion de usar la nube es universal para cualquier persona.\n", | |
"\n", | |
"Que necesitas? Un computador con internet y un archivo de audio o video en algun formato compatible, los formatos tipicos de audios y videos tomados con tu telefono deberian ser compatibles.\n", | |
"\n", | |
"Los pasos previos (1 y 2) son instalar unas librerias (se instalan en la nube si estas usando esa opcion)\n", | |
"\n", | |
"Luego subes tu(s) archivo(s) que quieres transcribir,\n", | |
"Seleccionas que archivo deseas transcribir y ejecutas la transcripción\n", | |
"\n", | |
"Luego de unos minutos aparecera un archivo de texto en el area lateral con el mismo nombre de tu archivo original pero con el sufijo _output." | |
], | |
"metadata": { | |
"id": "mMD5MYscC45V" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Como ejecutar en la nube\n", | |
"\n", | |
"Para ejecutar un cuadro de codigo se puede hacer click en el boton indicado, tambien se puede hacer click en la celda y presionar control+F10 en el teclado\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Cuando un código ya se completo de ejecutar aparecerá un ticket/visto verde al lado izquierdo\n", | |
"\n", | |
"\n" | |
], | |
"metadata": { | |
"id": "PO401jJVGRLL" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Instalación\n", | |
"\n", | |
"Los pasos de instalacion solo los debes hacer una vez desde que se conecta la nube" | |
], | |
"metadata": { | |
"id": "Krwi_ZtnDhtW" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Step One: Install Whisper\n", | |
"\n", | |
"This step will install the latest commit from the OpenAI repository on Github. Just run the cell below to install.\n", | |
"\n" | |
], | |
"metadata": { | |
"id": "DEZezug-DzPB" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "XRRPAS8t1DXJ", | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"!pip install git+https://github.com/openai/whisper.git" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Step Two: Install ffmpeg\n", | |
"\n", | |
"You need to install [ffmpeg](https://ffmpeg.org), a cross-platform solution to record, convert and stream audio and video. Just run the cell to install." | |
], | |
"metadata": { | |
"id": "2E86sIkRE96A" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!sudo apt update && sudo apt install ffmpeg" | |
], | |
"metadata": { | |
"id": "-YsDcsci1xRE", | |
"collapsed": true | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Step Three: Upload Your File\n", | |
"\n", | |
"Open the file folder on the left-hand corner of your Colab notebook. Drag the .mp3 you would like to transcribe into the \"Files\" section.\n", | |
"\n", | |
"This will upload the audio to your your Google Drive. Here's what it looks like once you've uploaded a file:\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n" | |
], | |
"metadata": { | |
"id": "FHH08HaDFQEs" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Paso intermedio: Elegir el modelo de Whisper a usar\n", | |
"\n", | |
"Normalmente se usará turbo para que el procesamiento sea mas rápido, pero si encuentras que el resultado no es tan bueno puedes probar los modelos medium o large para probar si mejora\n", | |
"\n", | |
"De igual forma se debe ejecutar este codigo al menos una vez" | |
], | |
"metadata": { | |
"id": "BRa722p4J2OV" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"#!whisper \"Clase 31 may, señalizacion 2.m4a\" --model medium\n", | |
"# Python\n", | |
"import whisper\n", | |
"\n", | |
"model = whisper.load_model(\"turbo\") # options: tiny, base, small, medium, large, turbo\n" | |
], | |
"metadata": { | |
"id": "-1pP466t8kpv" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Step Four: Seleccionar y Transcribir\n", | |
"\n", | |
"Para definir un archivo a procesar, simplemente copie y pegue el título de su archivo en la línea de comando a continuación y ejecute la celda.\n", | |
"\n", | |
"¡No olvides las comillas y la extension del archivo! En este ejemplo se llama audio.m4a\n", | |
"El archivo de salida tendrá un sufijo \"_output\" y extensión txt, lo podrás ver en los archivos en el menu izquierdo, desde ahí puedes descargarlo en tu computador." | |
], | |
"metadata": { | |
"id": "D7j3VhEsGc4G" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"audio_file_name = \"audio.m4a\"\n", | |
"\n", | |
"result = model.transcribe(audio_file_name)\n", | |
"\n", | |
"file_path = audio_file_name [:-4] + \"_output.txt\"\n", | |
"# print (file_path)\n", | |
"with open(file_path, 'w') as file:\n", | |
" file.write(result[\"text\"])" | |
], | |
"metadata": { | |
"id": "o8V7WGTCZBji" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Los pasos siguientes estan comentados y son opcionales, sirven para traducir ademas de transcribir, y procesar en lote varios archivos." | |
], | |
"metadata": { | |
"id": "IqvtA8W5MC6H" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Step Five: Translate\n", | |
"\n", | |
"To translate an .mp3, simply copy and paste the title of your .mp3 file into the command line below and run the cell.\n", | |
"\n", | |
"You can change the language as needed in the command line, and English is the default output. Don't forget the quotation marks!\n", | |
"\n", | |
"Note: I created and successfully ran every step of this notebook with my Colab Pro subscription.\n", | |
"\n", | |
"On the basic Google Colab plan, all the steps worked except \"Step Five.\" This final step kept getting a \"Runtime discontinued\" error on the basic plan." | |
], | |
"metadata": { | |
"id": "TbeClltmKIEZ" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# !whisper \"librodepoemas_04_garcialorca.mp3\" --task translate" | |
], | |
"metadata": { | |
"id": "DnZqS_5D2lhp" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# License\n", | |
"\n", | |
"MIT License\n", | |
"\n", | |
"Copyright (c) 2022 Jason Boog\n", | |
"\n", | |
"Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n", | |
"\n", | |
"The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n", | |
"\n", | |
"THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE." | |
], | |
"metadata": { | |
"id": "L9jzU75GL1N3" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Batch Processing\n", | |
"\n", | |
"Transcribe files in batch. Define file extension and the script will process all files in folder with that extension and create an output txt for each file\n" | |
], | |
"metadata": { | |
"id": "eJwez6jz4Sew" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"#import os\n", | |
"#directory_path = './'#'/content/drive/MyDrive/my_folder'\n", | |
"#target_extension = '.m4a'\n", | |
"\n", | |
"#filtered_files = [filename for filename in os.listdir(directory_path) if filename.endswith(target_extension)]\n", | |
"\n", | |
"#print(len(filtered_files))\n", | |
"#print(filtered_files)\n" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "UjSxEDwE54Bu", | |
"outputId": "ff851645-49c6-4bdc-bd25-6464f3c74219" | |
}, | |
"execution_count": null, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"0\n", | |
"[]\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"#import whisper\n", | |
"\n", | |
"#iteration = 0\n", | |
"#model = whisper.load_model(\"turbo\") # options: tiny, base, small, medium, large, turbo\n", | |
"#for filename in filtered_files:\n", | |
"# iteration = iteration + 1\n", | |
"# print(\"Iteration: \" + String(iteration))\n", | |
"# audio_file_name = filename\n", | |
"# result = model.transcribe(audio_file_name)\n", | |
"\n", | |
"# file_path = audio_file_name [:-4] + \"_output.txt\"\n", | |
" # print (file_path)\n", | |
"# with open(file_path, 'w') as file:\n", | |
"# file.write(result[\"text\"])\n", | |
"# print (\"Se procesaron todos los archivos\")\n" | |
], | |
"metadata": { | |
"id": "_6JbLZiU4b3D" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment