Skip to content

Instantly share code, notes, and snippets.

@Dharisd
Last active July 29, 2024 12:59
Show Gist options
  • Save Dharisd/55aeffef5074f7e8e586b20e2bdc2326 to your computer and use it in GitHub Desktop.
Save Dharisd/55aeffef5074f7e8e586b20e2bdc2326 to your computer and use it in GitHub Desktop.
#!/bin/bash
# Function to check if a command exists
command_exists() {
command -v "$1" >/dev/null 2>&1
}
# Check if all required arguments are provided
if [ "$#" -ne 5 ]; then
echo "Usage: $0 <HF_TOKEN> <WANDB_TOKEN> <GDRIVE_FILE_ID> <GITHUB_USERNAME> <GITHUB_PASSWORD>"
exit 1
fi
HF_TOKEN="$1"
WANDB_TOKEN="$2"
GDRIVE_FILE_ID="$3"
GITHUB_USERNAME="$4"
GITHUB_PASSWORD="$5"
# Ensure script is run as root or with sudo
if [ "$EUID" -ne 0 ]; then
echo "Please run as root or using sudo"
exit 1
fi
# Install system dependencies
apt-get update && apt-get install -y tmux nvtop unzip python3-pip git
# Install Python dependencies
pip3 install huggingface_hub hf-transfer transformers openai-whisper datasets wandb soundfile librosa accelerate jiwer evaluate gdown mosaicml-streaming
# Install flash-attn
pip3 install flash-attn --no-build-isolation
# Login to Hugging Face using token
huggingface-cli login --token $HF_TOKEN --add-to-git-credential
# Login to Weights & Biases using token
wandb login "$WANDB_TOKEN"
# Clone the GitHub repository
# Configure GitHub credentials
git config --global credential.helper store
echo "https://${GITHUB_USERNAME}:${GITHUB_PASSWORD}@github.com" > ~/.git-credentials
# Download and unzip training data
DATASET_ZIP="dataset.zip"
DATASET_DIR="dataset"
if [ ! -d "$DATASET_DIR" ]; then
gdown "https://drive.google.com/uc?id=$GDRIVE_FILE_ID" -O "$DATASET_ZIP"
unzip "$DATASET_ZIP" -d "$DATASET_DIR"
rm "$DATASET_ZIP"
fi
git clone https://github.com/Dharisd/asr-training.git
chmod +X /workspace/run.sh
echo "Setup complete! You can now start your training process."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment