Skip to content

Instantly share code, notes, and snippets.

@Vaibhavs10
Vaibhavs10 / pep8-cheatsheet.py
Created November 7, 2019 07:59
PEP 8 Cheatsheet
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""This module's docstring summary line.
This is a multi-line docstring. Paragraphs are separated with blank lines.
Lines conform to 79-column limit.
Module and packages names should be short, lower_case_with_underscores.
Notice that this in not PEP8-cheatsheet.py
Seriously, use flake8. Atom.io with https://atom.io/packages/linter-flake8
is awesome!
See http://www.python.org/dev/peps/pep-0008/ for more PEP-8 details
@Vaibhavs10
Vaibhavs10 / oracle_pandas.py
Last active November 12, 2019 17:24
Script to connect with Oracle and create a pandas dataframe
import pandas as pd
from sqlalchemy import create_engine
import cx_Oracle
oracle_connection_string = (
'oracle+cx_oracle://{username}:{password}@' +
cx_Oracle.makedsn('{hostname}', '{port}', service_name='{service_name}')
)
engine = create_engine(
# Credits: https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import seaborn as sns
# Total population, N.
N = 1339200000
%matplotlib inline
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
# Total population, N.
N = 1339200000
# Initial number of infected and recovered individuals, I0 and R0.
@Vaibhavs10
Vaibhavs10 / robust-asr.md
Last active May 17, 2022 10:12
Robust ASR: An applied survey of current SoTA ASR architectures

Motivation

Whilst the current ASR landscape is really promosing a lot of it is currently benchmarked on rather "clean" datasets. This often creates a false sense of confidence in the Architecture which might not translate to the real world.

Types of Noises

  1. Gaussian White Noise
  2. Real World Noise
  3. Choppy audio (random 1-2s removed from the audio snippet)
  4. Speed up (random 10s snippets sped up than the rest)

Evaluation

Hey hey!

We are on a mission to democratise speech, increase the language coverage of current SoTA speech recognition and push the limits of what is possible. Come join us from December 5th - 19th for a community sprint powered by Lambda Labs. Through this sprint, we'll cover 70+ languages, 39M - 1550M parameters & evaluate our models on real-world evaluation datasets.

Register your interest via the Google form here.

What is the sprint about ❓

The goal of the sprint is to fine-tune Whisper in as many languages as possible and make them accessible to the community. We hope that especially low-resource languages will profit from this event.

user_name=
ssh_key=""
cd /home
sudo useradd -m "$user_name"
sudo mkdir /home/"$user_name"/.ssh
echo "$ssh_key" | sudo tee -a /home/"$user_name"/.ssh/authorized_keys
sudo chsh -s /usr/bin/bash "$user_name"
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
#pip install git+https://github.com/huggingface/transformers.git
import datetime
import sys
from transformers import pipeline
from transformers.pipelines.audio_utils import ffmpeg_microphone_live
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-base", device=0)
sampling_rate = pipe.feature_extractor.sampling_rate
@Vaibhavs10
Vaibhavs10 / how_to_use_cv11.py
Created February 13, 2023 16:13
How to use Common Voice 11 with 🤗Datasets
# Load the dataset (locally)
from datasets import load_dataset
cv_11 = load_dataset("mozilla-foundation/common_voice_11_0", "hi", split="train")
# Stream the dataset
from datasets import load_dataset
@Vaibhavs10
Vaibhavs10 / zephyr-7b-beta-gptq-transformers.py
Created November 13, 2023 21:55
zephyr-7b-beta-gptq-transformers
!pip install transformers optimum
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "TheBloke/zephyr-7B-beta-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,