eustlb’s gists

eustlb / pyproject.toml

Created July 10, 2026 15:58

Reference reproducers for the HF GraniteSpeechNarForCTC test fixtures (from the original ibm-granite/granite-speech-4.1-2b-nar remote-code model, single-pass CTC)

	[project]
	name = "granite-speech-nar-reproducers"
	version = "0.0.0"
	description = "Reference reproducers for the HF GraniteSpeechNarForCTC test fixtures, from the original remote-code model."

	# Pinned to the transformers version the original checkpoint was authored against
	# (`config.json` -> `transformers_version: 5.8.0.dev0`), so the remote-code model reproduces exactly.
	# An isolated env is required: the transformers dev checkout under test natively registers the
	# `granite_speech_nar` model_type, which would clash with the repo's `trust_remote_code` classes.
	requires-python = ">=3.10,<3.13"

eustlb / _hf_capture.py

Last active June 24, 2026 15:20

NeMo reference reproducers for HF Nemotron3_5AsrForRNNT integration-test fixtures (nvidia/nemotron-3.5-asr-streaming-0.6b)

	"""Capture the HF Nemotron3_5AsrForRNNT outputs to bake into the integration tests.

	Not a reproducer (those are the NeMo reference). This runs the HF model on the same audio the tests
	use, conditioned on the language prompt `language="en-US"` (the test samples are English):

	- offline single (librispeech_asr_dummy sample 0, default att_context [56, 3])
	- offline batched (librispeech_asr_dummy samples 0..4, default att_context [56, 3])
	- streaming (obama.mp3, num_lookahead_tokens=6 i.e. att_context [56, 6], mel-frame chunks 49 then 56)
	"""

eustlb / _hf_capture.py

Created June 10, 2026 18:54

nemotron-asr reproducers (single/batch/streaming RNNT)

	"""Capture the HF NemotronAsrForRNNT outputs to bake into the integration tests.

	Not a reproducer (those are NeMo reference). This runs the HF model on the same audio the tests use:
	- offline single (librispeech_asr_dummy sample 0, default att_context)
	- offline batched (librispeech_asr_dummy samples 0..4)
	- streaming (obama.mp3, att_context_size=[70, 6], mel-frame chunks 49 then 56)
	and prints JSON so the exact HF strings can be pasted into EXPECTED_* in the test.
	"""

	import json

eustlb / _hf_capture.py

Created June 10, 2026 10:17

NeMo reference reproducers for HF NemotronAsrForRNNT integration-test fixtures (nvidia/nemotron-speech-streaming-en-0.6b)

	"""Capture the HF NemotronAsrForRNNT outputs to bake into the integration tests.

	Not a reproducer (those are NeMo reference). This runs the HF model on the same audio the tests use:
	- offline single (librispeech_asr_dummy sample 0, default att_context)
	- offline batched (librispeech_asr_dummy samples 0..4)
	- streaming (obama.mp3, att_context_size=[70, 6], mel-frame chunks 49 then 56)
	and prints JSON so the exact HF strings can be pasted into EXPECTED_* in the test.
	"""

	import json

eustlb / verify.py

Last active June 5, 2026 17:21

verify if hub weights are tied

	import json
	import struct
	import urllib.request

	# model -> (reference checkpoint, config class import path)
	MODELS = {
	"qwen2_audio": ("Qwen/Qwen2-Audio-7B-Instruct", "Qwen2AudioConfig"),
	"voxtral": ("mistralai/Voxtral-Mini-3B-2507", "VoxtralConfig"),
	"voxtral_realtime": ("mistralai/Voxtral-Mini-4B-Realtime-2602", "VoxtralRealtimeConfig"),
	"glmasr": ("zai-org/GLM-ASR-Nano-2512", "GlmAsrConfig"),

eustlb / generate_rnnt_loss_fixtures.py

Last active June 11, 2026 13:03

reproducers transformers tests parakeet rnnt

	"""Reference values for expected_rnnt_loss.json, generated from HF Transformers' own `rnnt_loss`.

	Unlike the other reproducers in this folder (which read NeMo reference values), `RNNTLossTest` is a unit
	test of `transformers.loss.loss_rnnt.rnnt_loss` on synthetic logits — there is no audio and no "ground truth"
	transcription, so a NeMo reference would add nothing: NeMo, torchaudio and HF all implement the same RNN-T
	negative-log-likelihood and agree on random inputs. What the HF wrapper adds on top of
	`torchaudio.functional.rnnt_loss` is the NeMo-style reduction layer (`none`/`sum`/`mean_batch`/`mean`/
	`mean_volume`), so we snapshot exactly those, straight from HF.

	The input construction (seed, shapes, draw order) MUST stay in lock-step with `RNNTLossTest._make_inputs` in

eustlb / benchmark_tdt_loss_kernel.py

Created April 16, 2026 13:49

reproduce benchmarks

	"""
	Benchmark TDT loss: PyTorch vs CUDA kernel vs NeMo (Numba).

	Sweeps over batch sizes and sequence lengths, measuring speed and peak memory.

	Usage:
	/raid/eustache/venvs/pr-44171/bin/python benchmark_tdt_loss.py
	/home/eustache_lebihan/pr-44171/NeMo/.venv/bin/python benchmark_tdt_loss.py --nemo-worker
	"""

eustlb / tdt_expected_loss_value.py

Last active April 16, 2026 13:32

reproduce expected value for tdt loss using NeMo

	"""
	Generate the expected TDT loss reference value using NeMo's GPU kernel.

	Runs nvidia/parakeet-tdt-0.6b-v3 in eval mode on 2 LibriSpeech samples
	with sigma=0 and computes the HF-style mean reduction (per-sample loss
	divided by target length, then averaged across the batch).

	NeMo commit: 16f469b122 (v2.8.0rc0)
	https://github.com/NVIDIA/NeMo/tree/16f469b122

eustlb / cohere_asr.py

Last active March 26, 2026 19:02

convert cohere asr tokenizer

	"""
	Convert a CohereAsr SentencePiece tokenizer (.model) to a HuggingFace fast tokenizer.

	Downloads the tokenizer files from CohereLabs/cohere-transcribe-03-2026 on HuggingFace Hub.
	"""

	import json
	from pathlib import Path

	from huggingface_hub import snapshot_download

eustlb / reproduce_outputs_test_integration_longform.py

Created February 20, 2026 15:18

Reproduce expected outputs for test_integration_longform in transformers/tests/models/mimi/test_modeling_mimi.py

	# Reproduce expected outputs for test_integration_longform in
	# transformers/tests/models/mimi/test_modeling_mimi.py
	#
	# This uses the original moshi codebase (https://github.com/kyutai-labs/moshi)
	# to generate reference values.
	#
	# Installation:
	# git clone https://github.com/kyutai-labs/moshi.git
	# uv pip install -e moshi/moshi/
	# uv pip install librosa