spicyboros-13b-2.2-tuning

Dataset used: https://huggingface.co/datasets/jondurbin/airoboros-2.2

Specifically, the instructions.jsonl file.

Fine-tuned with my fork of qlora: https://github.com/jondurbin/qlora

8x 80gb a100s

This was a full fine-tune (yes, the script is called qlora, but I used the --full_finetune option)

export BASE_DIR=/workspace
export WANDB_API_KEY=[redacted]
export WANDB_PROJECT=spicyboros-13b-2.2

torchrun --nnodes=1 --nproc_per_node=8 qlora/flash_qlora.py \
  --model_name_or_path $BASE_DIR/llama-2-13b-hf \
  --output_dir $BASE_DIR/$WANDB_PROJECT \
  --num_train_epochs 5 \
  --logging_steps 1 \
  --save_strategy steps \
  --save_steps 25 \
  --save_total_limit 4 \
  --data_seed 11422 \
  --evaluation_strategy steps \
  --eval_dataset_size 0.02 \
  --eval_steps 25 \
  --max_new_tokens 4096 \
  --dataloader_num_workers 3 \
  --logging_strategy steps \
  --remove_unused_columns False \
  --do_train \
  --full_finetune \
  --bits 16 \
  --bf16 \
  --dataset $BASE_DIR/instructions-clean.jsonl \
  --dataset_format airoboros \
  --model_max_len 4096 \
  --per_device_train_batch_size 4 \
  --learning_rate 0.000022 \
  --lr_scheduler_type cosine \
  --warmup_ratio 0.005 \
  --weight_decay 0.0 \
  --seed 11422 \
  --report_to wandb \
  --deepspeed deepspeed-13b.json \
  --optim adamw_torch \
  --gradient_checkpointing True \
  --ddp_find_unused_parameters False

Deepspeed:

{
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "bf16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 2,
    "contiguous_gradients": true,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "allgather_bucket_size": 5e8
  }
}

jondurbin/spicyboros-13b-2.2-tuning.md

Select an option

No results found

Select an option

No results found