Skip to content

Instantly share code, notes, and snippets.

@flaviodelgrosso
Created June 6, 2024 19:46
Show Gist options
  • Save flaviodelgrosso/7de34a4fa30e751e2cfc93590a70cf25 to your computer and use it in GitHub Desktop.
Save flaviodelgrosso/7de34a4fa30e751e2cfc93590a70cf25 to your computer and use it in GitHub Desktop.
insanely-fast-whisper.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/flaviodelgrosso/7de34a4fa30e751e2cfc93590a70cf25/insanely-fast-whisper.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Trh9GZ7eFkx0"
},
"source": [
"## Step 1: Mount your Google Drive to Google Colab (Optional). You can upload audio to your drive or directly to the colab environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JLfGIzTw8iNE"
},
"outputs": [],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZJJPofGxFkx2"
},
"source": [
"## Step 2: Install pipx and create python3 venv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wSCFWbni8lSn"
},
"outputs": [],
"source": [
"!pip install -q pipx && apt install python3.10-venv"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-iPoXZ6MFkx2"
},
"source": [
"## Step 3: Run insanely-fast-whisper passing the file name as an argument. You need to provide an Hugging Face authentication token in order to use Pyannote.audio to diarise the audio clips. Pass it as arg to che command --hf-token YOUR_HF_TOKEN"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wXEmcqzy8obT"
},
"outputs": [],
"source": [
"!pipx run insanely-fast-whisper --file-name /content/drive/MyDrive/audio_example.wav --hf-token \"YOUR_HF_TOKEN\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "034TbNAOFkx2"
},
"source": [
"## Step 4: Initialize python code to parse output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "A1wIXDv-p-Y4"
},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"\n",
"\n",
"class TxtFormatter:\n",
" @classmethod\n",
" def preamble(cls):\n",
" return \"\"\n",
"\n",
" @classmethod\n",
" def format_chunk(cls, chunk, index):\n",
" text = chunk['text']\n",
" return f\"{text}\\n\"\n",
"\n",
"\n",
"class SrtFormatter:\n",
" @classmethod\n",
" def preamble(cls):\n",
" return \"\"\n",
"\n",
" @classmethod\n",
" def format_seconds(cls, seconds):\n",
" whole_seconds = int(seconds)\n",
" milliseconds = int((seconds - whole_seconds) * 1000)\n",
"\n",
" hours = whole_seconds // 3600\n",
" minutes = (whole_seconds % 3600) // 60\n",
" seconds = whole_seconds % 60\n",
"\n",
" return f\"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}\"\n",
"\n",
" @classmethod\n",
" def format_chunk(cls, chunk, index):\n",
" text = chunk['text']\n",
" start, end = chunk['timestamp'][0], chunk['timestamp'][1]\n",
" start_format, end_format = cls.format_seconds(start), cls.format_seconds(end)\n",
" return f\"{index}\\n{start_format} --> {end_format}\\n{text}\\n\\n\"\n",
"\n",
"\n",
"class VttFormatter:\n",
" @classmethod\n",
" def preamble(cls):\n",
" return \"WEBVTT\\n\\n\"\n",
"\n",
" @classmethod\n",
" def format_seconds(cls, seconds):\n",
" whole_seconds = int(seconds)\n",
" milliseconds = int((seconds - whole_seconds) * 1000)\n",
"\n",
" hours = whole_seconds // 3600\n",
" minutes = (whole_seconds % 3600) // 60\n",
" seconds = whole_seconds % 60\n",
"\n",
" return f\"{hours:02d}:{minutes:02d}:{seconds:02d}.{milliseconds:03d}\"\n",
"\n",
" @classmethod\n",
" def format_chunk(cls, chunk, index):\n",
" text = chunk['text']\n",
" start, end = chunk['timestamp'][0], chunk['timestamp'][1]\n",
" start_format, end_format = cls.format_seconds(start), cls.format_seconds(end)\n",
" return f\"{index}\\n{start_format} --> {end_format}\\n{text}\\n\\n\"\n",
"\n",
"\n",
"def convert(input_path, output_format, output_dir, verbose):\n",
" with open(input_path, 'r') as file:\n",
" data = json.load(file)\n",
"\n",
" formatter_class = {\n",
" 'srt': SrtFormatter,\n",
" 'vtt': VttFormatter,\n",
" 'txt': TxtFormatter\n",
" }.get(output_format)\n",
"\n",
" string = formatter_class.preamble()\n",
" for index, chunk in enumerate(data['chunks'], 1):\n",
" entry = formatter_class.format_chunk(chunk, index)\n",
"\n",
" if verbose:\n",
" print(entry)\n",
"\n",
" string += entry\n",
"\n",
" with open(os.path.join(output_dir, f\"output.{output_format}\"), 'w', encoding='utf-8') as file:\n",
" file.write(string)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l1SmUR2-Fkx3"
},
"source": [
"## Step 5: Run the convertion. Example: convert(output_file, output_format, target_dir, verbose)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qa_7qz-x4lUA"
},
"outputs": [],
"source": [
"convert(\"output.json\", \"vtt\", \"/content\", False);"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "L4",
"machine_shape": "hm",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment