fauxneticien’s gists

fauxneticien / jupyter-introduction.json

Last active October 8, 2022 04:51

Testing Saturn Cloud for tutorials

	{
	"name": "jupyter-introduction",
	"image_uri": "public.ecr.aws/saturncloud/saturn:2022.01.06",
	"description": "Introduction to Jupyter",
	"working_directory": "/home/jovyan/workspace/introduction",
	"start_script": "pip install tqdm epitran\npip uninstall -y ipywidgets",
	"git_repositories": [
	{
	"url": "https://github.com/parledoct/tutorials",
	"reference": "introduction",

fauxneticien / setup.sh

Last active June 10, 2022 13:40

Try to replicate fine-tuning wav2vec 2.0 with 10 minutes of Librispeech data

	apt-get install -y tmux

	pip install transformers==4.19.2 datasets jiwer wandb bitsandbytes-cuda113

	wget https://huggingface.co/facebook/wav2vec2-large-960h/raw/main/vocab.json

fauxneticien / check-cuda.py

Last active April 6, 2022 16:37

Check CUDA/cuDNN info

	# Run as:
	# wget -O check-cuda.py https://gist.github.com/fauxneticien/343b1dd7b68a30cb6f8983dacac28721/raw && python check-cuda.py

	# Adapted from https://amytabb.com/til/2020/10/05/cudnn-pytorch/

	import torch

	print("Is cuda available?", torch.cuda.is_available())

	print("CUDA version:", torch.version.cuda)

fauxneticien / cis-by-condition.r

Last active December 7, 2021 23:52

Parametric bootstrap estimates for A/B test conditions

	library(boot)
	library(purrr)

	n_bootstraps <- 100
	sample_data <- read.csv("~/Desktop/sample-data.csv", stringsAsFactors = FALSE)

	get_mean_pdiff <- function(data, indices) {
	d <- data[indices,]
	return(mean(d$prop_diff))
	}

fauxneticien / wav_to_w2v2-xlsr-feats.py

Created August 4, 2021 21:19

	import os
	import pickle
	import torch
	import soundfile as sf
	import numpy as np
	import pandas as pd
	import torch

	from argparse import ArgumentParser
	from glob import glob

fauxneticien / transform.R

Created June 16, 2021 05:12

Transform lexicon from long format to wide format

	library(readr)
	library(dplyr)
	library(tidyr)
	library(zoo)

	dict <- read_csv("~/sandboxes/long2wide-dict/test.csv")

	dict %>%
	filter(!is.na(code)) %>%
	mutate(lx_start = ifelse(code == "lx", 1:n(), NA) %>% na.locf()) %>%

fauxneticien / sentence.csv

Created April 2, 2021 21:21

MFCC features for a word and a sentence such that the word occurs at the start of the sentence

We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 39 columns, instead of 12 in line 1.

-0.04994392395019531,1.5175657272338867,0.0018199682235717773,-2.087226390838623,0.6159842014312744,-0.04659390449523926,0.25803107023239136,-1.709064245223999,-0.39627861976623535,-1.5016419887542725,-0.245763897895813,-0.5718547105789185,-1.7099677324295044,0.20130681991577148,0.019677013158798218,0.009180034510791302,-0.1376107931137085,-0.27925705909729004,0.13714897632598877,-0.07046601176261902,0.3395354151725769,-0.09866419434547424,0.16999399662017822,0.309618204832077,0.011659674346446991,0.5186675190925598,0.05075535178184509,-0.028768420219421387,-0.01901824399828911,-0.03607192635536194,0.007255005184561014,0.04746334254741669,-0.07811667770147324,0.06717421114444733,0.004581433720886707,0.08137884736061096,0.004368830006569624,-0.10787150263786316,0.0864943340420723

0.48593616485595703,1.8348214626312256,0.20389342308044434,-2.4626524448394775,-0.7713630795478821,0.5448504686355591,0.19752508401870728,-0.7329893708229065,-0.5736892819404602,-1.0117650032043457,0.645517110824585,-0.604004144668579

fauxneticien / goodbye-hello-goodbye.wav.w2v2.csv

Created April 2, 2021 20:38

wav2vec 2.0 features extracted from two wav files (hello.wav, goodbye-hello-goodbye.wav)

We can't make this file beautiful and searchable because it's too large.

-6.470265388488769531e-01,6.466341614723205566e-01,6.083387136459350586e-01,-1.390124261379241943e-01,-2.054527997970581055e-01,-6.464222446084022522e-03,1.553679406642913818e-01,-4.968349635601043701e-02,-5.051375031471252441e-01,-3.241975903511047363e-01,-9.865309298038482666e-02,-6.458837538957595825e-02,2.464481592178344727e-01,-3.349821865558624268e-01,-3.121077120304107666e-01,-4.898094013333320618e-02,-5.539919435977935791e-02,-7.468717098236083984e-01,4.304423928260803223e-02,-4.351349547505378723e-02,1.737072616815567017e-01,4.124327898025512695e-01,-6.332009285688400269e-02,4.159340560436248779e-01,1.294069290161132812e-01,2.008238285779953003e-01,-1.832776367664337158e-01,-1.074232980608940125e-01,1.395025968551635742e+00,1.074964478611946106e-01,-1.575990617275238037e-01,-3.904544711112976074e-01,-4.046170786023139954e-02,-1.560430824756622314e-01,-1.653008013963699341e-01,3.130651712417602539e-01,-8.037760108709335327e-02,5.392159894108772278e-02,-1.310994476079940796e-01,3.478820025920867920e-01

fauxneticien / dtwupd-test.ipynb

Created April 1, 2021 18:18

Test dtwupd function on speech-like matrices

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

fauxneticien / dtw_cython-setup.py

Created March 31, 2021 23:02

Segmental DTW implementation in Cython

	from distutils.core import setup, Extension
	from Cython.Build import cythonize
	import numpy

	setup(
	ext_modules = cythonize("dtw_cython.pyx", annotate=True),
	include_dirs=[numpy.get_include()]
	)

Nay San fauxneticien