Skip to content

Instantly share code, notes, and snippets.

View fauxneticien's full-sized avatar

Nay San fauxneticien

View GitHub Profile
@fauxneticien
fauxneticien / jupyter-introduction.json
Last active October 8, 2022 04:51
Testing Saturn Cloud for tutorials
{
"name": "jupyter-introduction",
"image_uri": "public.ecr.aws/saturncloud/saturn:2022.01.06",
"description": "Introduction to Jupyter",
"working_directory": "/home/jovyan/workspace/introduction",
"start_script": "pip install tqdm epitran\npip uninstall -y ipywidgets",
"git_repositories": [
{
"url": "https://github.com/parledoct/tutorials",
"reference": "introduction",
@fauxneticien
fauxneticien / setup.sh
Last active June 10, 2022 13:40
Try to replicate fine-tuning wav2vec 2.0 with 10 minutes of Librispeech data
apt-get install -y tmux
pip install transformers==4.19.2 datasets jiwer wandb bitsandbytes-cuda113
wget https://huggingface.co/facebook/wav2vec2-large-960h/raw/main/vocab.json
@fauxneticien
fauxneticien / check-cuda.py
Last active April 6, 2022 16:37
Check CUDA/cuDNN info
# Run as:
# wget -O check-cuda.py https://gist.github.com/fauxneticien/343b1dd7b68a30cb6f8983dacac28721/raw && python check-cuda.py
# Adapted from https://amytabb.com/til/2020/10/05/cudnn-pytorch/
import torch
print("Is cuda available?", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
@fauxneticien
fauxneticien / cis-by-condition.r
Last active December 7, 2021 23:52
Parametric bootstrap estimates for A/B test conditions
library(boot)
library(purrr)
n_bootstraps <- 100
sample_data <- read.csv("~/Desktop/sample-data.csv", stringsAsFactors = FALSE)
get_mean_pdiff <- function(data, indices) {
d <- data[indices,]
return(mean(d$prop_diff))
}
import os
import pickle
import torch
import soundfile as sf
import numpy as np
import pandas as pd
import torch
from argparse import ArgumentParser
from glob import glob
@fauxneticien
fauxneticien / transform.R
Created June 16, 2021 05:12
Transform lexicon from long format to wide format
library(readr)
library(dplyr)
library(tidyr)
library(zoo)
dict <- read_csv("~/sandboxes/long2wide-dict/test.csv")
dict %>%
filter(!is.na(code)) %>%
mutate(lx_start = ifelse(code == "lx", 1:n(), NA) %>% na.locf()) %>%
@fauxneticien
fauxneticien / sentence.csv
Created April 2, 2021 21:21
MFCC features for a word and a sentence such that the word occurs at the start of the sentence
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 39 columns, instead of 12 in line 1.
-0.04994392395019531,1.5175657272338867,0.0018199682235717773,-2.087226390838623,0.6159842014312744,-0.04659390449523926,0.25803107023239136,-1.709064245223999,-0.39627861976623535,-1.5016419887542725,-0.245763897895813,-0.5718547105789185,-1.7099677324295044,0.20130681991577148,0.019677013158798218,0.009180034510791302,-0.1376107931137085,-0.27925705909729004,0.13714897632598877,-0.07046601176261902,0.3395354151725769,-0.09866419434547424,0.16999399662017822,0.309618204832077,0.011659674346446991,0.5186675190925598,0.05075535178184509,-0.028768420219421387,-0.01901824399828911,-0.03607192635536194,0.007255005184561014,0.04746334254741669,-0.07811667770147324,0.06717421114444733,0.004581433720886707,0.08137884736061096,0.004368830006569624,-0.10787150263786316,0.0864943340420723
0.48593616485595703,1.8348214626312256,0.20389342308044434,-2.4626524448394775,-0.7713630795478821,0.5448504686355591,0.19752508401870728,-0.7329893708229065,-0.5736892819404602,-1.0117650032043457,0.645517110824585,-0.604004144668579
@fauxneticien
fauxneticien / goodbye-hello-goodbye.wav.w2v2.csv
Created April 2, 2021 20:38
wav2vec 2.0 features extracted from two wav files (hello.wav, goodbye-hello-goodbye.wav)
We can't make this file beautiful and searchable because it's too large.
-6.470265388488769531e-01,6.466341614723205566e-01,6.083387136459350586e-01,-1.390124261379241943e-01,-2.054527997970581055e-01,-6.464222446084022522e-03,1.553679406642913818e-01,-4.968349635601043701e-02,-5.051375031471252441e-01,-3.241975903511047363e-01,-9.865309298038482666e-02,-6.458837538957595825e-02,2.464481592178344727e-01,-3.349821865558624268e-01,-3.121077120304107666e-01,-4.898094013333320618e-02,-5.539919435977935791e-02,-7.468717098236083984e-01,4.304423928260803223e-02,-4.351349547505378723e-02,1.737072616815567017e-01,4.124327898025512695e-01,-6.332009285688400269e-02,4.159340560436248779e-01,1.294069290161132812e-01,2.008238285779953003e-01,-1.832776367664337158e-01,-1.074232980608940125e-01,1.395025968551635742e+00,1.074964478611946106e-01,-1.575990617275238037e-01,-3.904544711112976074e-01,-4.046170786023139954e-02,-1.560430824756622314e-01,-1.653008013963699341e-01,3.130651712417602539e-01,-8.037760108709335327e-02,5.392159894108772278e-02,-1.310994476079940796e-01,3.478820025920867920e-01
@fauxneticien
fauxneticien / dtwupd-test.ipynb
Created April 1, 2021 18:18
Test dtwupd function on speech-like matrices
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fauxneticien
fauxneticien / dtw_cython-setup.py
Created March 31, 2021 23:02
Segmental DTW implementation in Cython
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy
setup(
ext_modules = cythonize("dtw_cython.pyx", annotate=True),
include_dirs=[numpy.get_include()]
)