Skip to content

Instantly share code, notes, and snippets.

@jorupp
Last active July 20, 2023 01:18
Show Gist options
  • Save jorupp/161c28ae91353fe84a31888ee60d3c0a to your computer and use it in GitHub Desktop.
Save jorupp/161c28ae91353fe84a31888ee60d3c0a to your computer and use it in GitHub Desktop.

Setup notes for Windows 11

This is how I got https://github.com/facebookresearch/llama working with Llama2 on a Windows 11 machine with a 4080 (16GB VRAM).

  1. Download a modern version of wget (with support for TLS 1.2) - ie. https://eternallybored.org/misc/wget/
  2. (if necessary) Modify download.sh to call your version of wget instead of the default one.
  3. Run ./download.sh via Git Bash and give it the URL from your email (it should start with https://download.llamameta.net, not https://l.facebook.com/). Warning: the 70B parameter models are big - figure 2GB to download per 1B parameters.
  4. Create a virtual environment - python -m venv .venv
  5. Activate virtual environment - . .\.venv\scripts\Activate.ps1
  6. Install prereqs - pip install -r requirements.txt
  7. Remove CPU-based torch - pip uninstall torch
  8. Install CUDA-based torch - pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 (this takes a while - it's ~2.6GB just for torch)
  9. Run python -m torch.utils.collect_env to confirm you're running the CUDA-supporting version and that it detects CUDA.
  10. Modify example_text_completion.py to add import torch and put this in main(): torch.distributed.init_process_group("gloo") (I couldn't find a Windows build of torch with CUDA and NCCL).
  11. Run torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 1280 --max_batch_size 4 --max_gen_length 1024
  12. Edit example_text_completion.py and re-run the above to play with the model.

I wasn't able to get the 13b model to work - was getting an error: AssertionError: Loading a checkpoint for MP=2 but world size is 1, but setting --nproc_per_node 2 gave RuntimeError: CUDA error: invalid device ordinal because I only have one GPU. meta-llama/llama#101 (comment) looks like a possible option to "reshard" the 13b model to run on a single GPU, but I haven't investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment