스탠포드알파카 학습하기

runpod.io에서 실행
Llama-7B 사용
llama 모델을 workspace 밖 홈디렉에서 다운받기 때문에 컨테이너 용량을 15GB 정도 잡아줘야 한다
Volume 용량은 30G 이상으로 잡아줘야 한다 - 파인튜닝 끝나고 output에 저장되는 파일들이 25GB 남짓되는 크기가 필요하기 때문
허깅페이스 모델은 별도 다운 받을 필요 없고
A100 80G X 4로 처음 시작 1% 지점에서 예상시간 5:37:57 찍힘

Install vi

apt update
apt install vim

Go to Workspace

cd /workspace

Git LFS

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
apt install git-lfs
git lfs install

Clone stanford alpaca

git clone https://github.com/tatsu-lab/stanford_alpaca.git

Install transformers from git & requirements.txt

pip install git+https://github.com/huggingface/transformers.git
pip install -r requirements.txt

Train alpaca

torchrun --nproc_per_node=4 --master_port=8080 train.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir ../output \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

Trouble shooting

Change from LLaMATokenizer to LlamaTokenizer

vi /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/tokenizer_config.json

Exception: Could not find the transformer layer class to wrap in the model

tatsu-lab/stanford_alpaca#58 (comment)
- Change value of fsdp_transformer_layer_cls_to_wrap to LlamaDecoderLayer - 위 옵션에는 수정

wandb disable for nohup

wandb offline

edp1096/alpaca_train_run.md

스탠포드알파카 학습하기

Install vi

Go to Workspace

Git LFS

Clone stanford alpaca

Install transformers from git & requirements.txt

Train alpaca

Trouble shooting

Change from LLaMATokenizer to LlamaTokenizer

Exception: Could not find the transformer layer class to wrap in the model

wandb disable for nohup

edp1096 commented Apr 25, 2023

93041025 commented Apr 25, 2023 via email