johnmeade’s gists

johnmeade / dist_sampler.py

Last active July 13, 2019 20:28

Fast sampling from arbitrary probability densities in Python

	'''
	Tools for sampling from arbitrary probability densities.

	Requirements:
	pip install scipy numpy

	John Meade 2019
	MIT license
	'''

johnmeade / pool_data_loader.py

Created July 13, 2019 19:50

A Python multiprocessing pool drop-in replacement for the PyTorch DataLoader class

	'''
	A multiprocessing Pool drop-in replacement for the pytorch
	DataLoader class. Built to work around an apparent bug in
	the default pytorch DataLoader, in which it hangs indefinitely.
	It is possible to reach a sustained 95-100% GPU usage (as
	reported by `nvidia-smi`) using this implementation.

	Requirements:
	pip install filelock

johnmeade / snr.py

Last active November 11, 2025 06:20

WADA SNR Estimation of Speech Signals in Python

	import numpy as np

	def wada_snr(wav):
	# Direct blind estimation of the SNR of a speech signal.
	#
	# Paper on WADA SNR:
	# http://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf
	#
	# This function was adapted from this matlab code:
	# https://labrosa.ee.columbia.edu/projects/snreval/#9

johnmeade / speech_noise.py

Last active September 6, 2024 10:25

Use VAD to separate out regions without speech content, and estimate mean power of background noise

	"""
	Estimate background noise power level of a speech waveform.
	Requires some non-speech regions in the wave.

	Requirements:
	pip install numpy librosa soundfile webrtcvad

	MIT License John Meade 2021
	"""

johnmeade / aeneas_word_force_align.py

Created April 14, 2021 19:19

Forced Alignment with Aeneas at the word-level

	"""
	Note: Aeneas is based on TTS and DTW, which is not ideal for word-level alignment.
	However, it is easy to install and works quite well, so it is still very useful.
	This gist just lazily writes files to "/tmp" for demonstration purposes.

	System and Python dependencies (Ubuntu):
	sudo apt-get install python-dev espeak espeak-data libespeak1 libespeak-dev ffmpeg
	pip install numpy textgrid
	pip install aeneas

johnmeade / asr.py

Last active August 17, 2024 04:40

Multi-language ASR using Huggingface transformer models.

	"""
	Multi-language ASR using Huggingface transformer models.

	Python dependencies:
	pip install transformers==4.5.0 librosa soundfile torch
	"""

	from typing import NamedTuple
	from functools import lru_cache

johnmeade / para.sh

Last active June 24, 2025 21:20

Parallel file sync / remove, for parallel filesystems (eg Lustre)

	#!/bin/bash
	# MIT License / John Meade

	print_usage() {
	echo "Parallel file sync / remove."
	echo
	echo "Installation:"
	echo " gcc --version # eg 'sudo apt install build-essential'"
	echo " sudo chmod 755 para"
	echo " sudo mv para /usr/local/bin"

johnmeade / max-dataload-throughput.py

Last active June 25, 2025 19:07

Generic PyTorch Dataloading Benchmark

	"""
	Generic PyTorch Dataloading Benchmark.
	MIT Licence / John Meade.

	# installation
	pip install torch numpy accelerate

	# example
	ulimit -n 20000
	accelerate config --config_file "acc-cfg-8gpu.yaml"

johnmeade / parallel_pytorch_pipeline.py

Last active January 9, 2026 16:19

High-GPU-utilization PyTorch multiprocessing pipeline example

	"""
	MIT License 2025 John Meade

	Writing custom multiprocessing code for PyTorch can be tricky.
	Generally you should use `accelerate` if possible, but this is not an option for bigger pipelines.
	This gist provides a starting point for high GPU utilization multiprocessing pipelines for PyTorch models.
	Some symptoms of bottlenecks when using this framework are:
	* input queues to GPU workers frequently having size 0
	* high GPU utilization on one GPU, but low util on others
	"""

John Meade johnmeade