This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Multi-language ASR using Huggingface transformer models. | |
Python dependencies: | |
pip install transformers==4.5.0 librosa soundfile torch | |
""" | |
from typing import NamedTuple | |
from functools import lru_cache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Note: Aeneas is based on TTS and DTW, which is not ideal for word-level alignment. | |
However, it is easy to install and works quite well, so it is still very useful. | |
This gist just lazily writes files to "/tmp" for demonstration purposes. | |
System and Python dependencies (Ubuntu): | |
sudo apt-get install python-dev espeak espeak-data libespeak1 libespeak-dev ffmpeg | |
pip install numpy textgrid | |
pip install aeneas |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Estimate background noise power level of a speech waveform. | |
Requires some non-speech regions in the wave. | |
Requirements: | |
pip install numpy librosa soundfile webrtcvad | |
MIT License John Meade 2021 | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def wada_snr(wav): | |
# Direct blind estimation of the SNR of a speech signal. | |
# | |
# Paper on WADA SNR: | |
# http://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf | |
# | |
# This function was adapted from this matlab code: | |
# https://labrosa.ee.columbia.edu/projects/snreval/#9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
A multiprocessing Pool drop-in replacement for the pytorch | |
DataLoader class. Built to work around an apparent bug in | |
the default pytorch DataLoader, in which it hangs indefinitely. | |
It is possible to reach a sustained 95-100% GPU usage (as | |
reported by `nvidia-smi`) using this implementation. | |
Requirements: | |
pip install filelock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Tools for sampling from arbitrary probability densities. | |
Requirements: | |
pip install scipy numpy | |
John Meade 2019 | |
MIT license | |
''' |