This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| TRANSFORMERS_SRC_PATH = "/admin/home/eustache_lebihan/dev/benchmark-whisper/transformers-fix/src" | |
| import sys | |
| sys.path.insert(0, TRANSFORMERS_SRC_PATH) | |
| import wandb | |
| from tqdm import tqdm | |
| import evaluate | |
| import os | |
| import torch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| OPENAI_SRC_PATH = "/admin/home/eustache_lebihan/dev/benchmark-whisper/whisper" | |
| import sys | |
| sys.path.insert(0, OPENAI_SRC_PATH) | |
| import wandb | |
| from tqdm import tqdm | |
| import evaluate | |
| import os | |
| import torch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| OPENAI_SRC_PATH = "/admin/home/eustache_lebihan/dev/benchmark-whisper/whisper-myfork" | |
| import sys | |
| sys.path.insert(0, OPENAI_SRC_PATH) | |
| import wandb | |
| from tqdm import tqdm | |
| import evaluate | |
| import os |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch | |
| import evaluate | |
| from transformers.models.whisper.english_normalizer import EnglishTextNormalizer | |
| from transformers import MoonshineForConditionalGeneration, AutoProcessor, WhisperProcessor | |
| from datasets import load_dataset, Audio | |
| from tqdm import tqdm | |
| import json | |
| wer_metric = evaluate.load("wer") | |
| device = "cuda:0" if torch.cuda.is_available() else "cpu" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch | |
| from transformers import MoonshineForConditionalGeneration, AutoProcessor | |
| from tqdm import tqdm | |
| import json | |
| device = "cuda:0" if torch.cuda.is_available() else "cpu" | |
| torch_dtype = torch.float32 | |
| attn_implementation = "sdpa" | |
| model = MoonshineForConditionalGeneration.from_pretrained("UsefulSensors/moonshine-tiny", attn_implementation=attn_implementation).to(device, torch_dtype) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| traces-*/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from typing import IO | |
| from datatrove.io import get_datafolder | |
| from datatrove.executor import SlurmPipelineExecutor | |
| from datatrove.pipeline.readers import ParquetReader | |
| from datatrove.pipeline.writers import ParquetWriter | |
| from datatrove.utils.typeshelper import StatHints | |
| class ParquetReaderInMemory(ParquetReader): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from datasets import load_dataset, Audio | |
| from transformers import ( | |
| CsmForConditionalGeneration, | |
| TrainingArguments, | |
| CsmProcessor, | |
| Trainer | |
| ) | |
| processor = CsmProcessor.from_pretrained("eustlb/csm-1b") | |
| model = CsmForConditionalGeneration.from_pretrained("eustlb/csm-1b") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch | |
| from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor | |
| from datasets import load_dataset | |
| device = "cuda:0" if torch.cuda.is_available() else "cpu" | |
| torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 | |
| model_id = "openai/whisper-large-v3-turbo" | |
| model = AutoModelForSpeechSeq2Seq.from_pretrained( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # TEST GREEDY FLOAT 32 | |
| # make sure to clone [email protected]:eustlb/csm.git and checkout compare-trfms | |
| import sys | |
| sys.path.insert(0, "./csm") | |
| from generator import load_csm_1b, Segment | |
| from datasets import load_dataset, Audio | |
| from huggingface_hub import hf_hub_download | |
| import torch |