Setup notes for Windows 11

This is how I got https://github.com/facebookresearch/llama working with Llama2 on a Windows 11 machine with a 4080 (16GB VRAM).

Download a modern version of wget (with support for TLS 1.2) - ie. https://eternallybored.org/misc/wget/
(if necessary) Modify download.sh to call your version of wget instead of the default one.
Run ./download.sh via Git Bash and give it the URL from your email (it should start with https://download.llamameta.net, not https://l.facebook.com/). Warning: the 70B parameter models are big - figure 2GB to download per 1B parameters.
Create a virtual environment - python -m venv .venv
Activate virtual environment - . .\.venv\scripts\Activate.ps1
Install prereqs - pip install -r requirements.txt
Remove CPU-based torch - pip uninstall torch
Install CUDA-based torch - pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 (this takes a while - it's ~2.6GB just for torch)
Run python -m torch.utils.collect_env to confirm you're running the CUDA-supporting version and that it detects CUDA.
Modify example_text_completion.py to add import torch and put this in main(): torch.distributed.init_process_group("gloo") (I couldn't find a Windows build of torch with CUDA and NCCL).
Run torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 1280 --max_batch_size 4 --max_gen_length 1024
Edit example_text_completion.py and re-run the above to play with the model.

I wasn't able to get the 13b model to work - was getting an error: AssertionError: Loading a checkpoint for MP=2 but world size is 1, but setting --nproc_per_node 2 gave RuntimeError: CUDA error: invalid device ordinal because I only have one GPU. meta-llama/llama#101 (comment) looks like a possible option to "reshard" the 13b model to run on a single GPU, but I haven't investigated.

jorupp/llama2-setup-notes.md

Setup notes for Windows 11