Hivemind only works on Linux. Thankfully you can still use WSL to run the training.
Follow this guide: https://learn.microsoft.com/es-es/windows/wsl/install
Basically, you should open a CMD window (the black window with white words) and type:
wsl --install -d ubuntu
Once you run it, Ubuntu will install as WSL on your computer. It will ask you for account creation, username, password, and prompt you with a terminal. Once you are there you can follow the rest of this guide. If you need help ask me on the discord.
Install the following packages corresponding to your distribution:
htop screen psmisc python3-pip unzip wget gcc g++ nano
On ubuntu: apt-get install htop screen psmisc python3-pip unzip wget gcc g++ nano -y
Then, install the Python Packages:
pip install diffusers>=0.5.1 numpy==1.23.4 wandb==0.13.4 torch torchvision transformers>=4.21.0 huggingface-hub>=0.10.0 Pillow==9.2.0 tqdm==4.64.1 ftfy==6.1.1 bitsandbytes pynvml~=11.4.1 psutil~=5.9.0 accelerate==0.13.1 scipy==1.9.3 hivemind triton==2.0.0.dev20221120
Optional, install xformers for larger batch size:
conda install xformers -c xformers/label/dev
The trainer is available here: https://gist.github.com/chavinlo/7b03320b1a519c47edd365835366aee5
To download it directly into your instance, you can use wget:
wget https://gist.githubusercontent.com/chavinlo/7b03320b1a519c47edd365835366aee5/raw/f394b89b01a423d4d0a6cb5ad61e6ec49c2e9358/trainer.py
If your system does not has Wget, you can use curl, which is included on most distributions:
curl https://gist.githubusercontent.com/chavinlo/7b03320b1a519c47edd365835366aee5/raw/f394b89b01a423d4d0a6cb5ad61e6ec49c2e9358/trainer.py -o trainer.py
In the meantime the trainer only supports CLI flags. I will add a YAML soon.
Write the following into a text file named "run.sh":
torchrun --nproc_per_node=1 \
trainer.py \
--workingdirectory hivemindtemp \
--wantedimages 500 \
--datasetserver="DATASET_SERVER_IP" \
--node="true" \
--o_port1=LOCAL_TCP_PORT \
--o_port2=LOCAL_UDP_PORT \
--ip_is_different="true" \
--p_ip="PUBLIC_IP" \
--p_port1=PUBLIC_TCP_PORT \
--p_port2=PUBLIC_UDP_PORT \
--batch_size 2 \
--use_xformers="true" \
--save_steps 1000 \
--image_log_steps 400 \
--hf_token="YOUR HUGGIGNFACE TOKEN" \
--model runwayml/stable-diffusion-v1-5 \
--run_name testrun1 \
--gradient_checkpointing="true" \
--use_8bit_adam="false" \
--fp16="true" \
--resize="true" \
--wandb="false" \
--no_migration="true" \
OR
Download the following file:
Via Wget:
wget https://gist.githubusercontent.com/chavinlo/35e304fc0015dc746d270caa1e327111/raw/efadd14db24aef14cf3143f5bf4456014cdc0e36/run.sh
Via curl:
curl https://gist.githubusercontent.com/chavinlo/35e304fc0015dc746d270caa1e327111/raw/efadd14db24aef14cf3143f5bf4456014cdc0e36/run.sh -o run.sh
Once you have it on your computer run: chmod +x run.sh
Now, go to https://huggingface.co/runwayml/stable-diffusion-v1-5 and accept the terms with your huggingfaces account.
This is the most important part. On the file you just created you need to change certain parts:
DATASET_SERVER_IP --> Provided Dataset Server IP (check discord)
LOCAL_TCP_PORT --> Local port to get TCP requests
LOCAL_UDP_PORT --> Local port to get UDP requests
YOUR HUGGINGFACE TOKEN -> Your HuggingFace Token, you can create or find one here: https://huggingface.co/settings/tokens
If you dont want to extend the network:
set ip_is_different="true"
to ip_is_different="false"
change PUBLIC_IP to 127.0.0.1
change PUBLIC_TCP_PORT to 0
change PUBLIC_UDP_PORT to 0
Please, please if you don't know what port forwarding is or not sure how your network is configured DO NOT TAKE THE TIME TO DO THE REST BELOW!!!!!
If only, AND ONLY YOU WOULD LIKE TO EXTEND THE NETWORK (NOT REQUIRED!!!!), and your instance is behind a firewall, NAT, or has portforwarding:
leave ip_is_different="true"
as it is
change PUBLIC_IP to your public IP
change PUBLIC_TCP_PORT to the PUBLIC port thats forwarding into your LOCAL_TCP_PORT
change PUBLIC_UDP_PORT to the PUBLIC port thats forwarding into your LOCAL_UDP_PORT
If you want to increase the batch size:
change the "2" next to the "batch_size" to a higher number. usually its 2GB every 1 increase
For RTX3090 users, 2 is the max. batch size with XFORMERS enabled. If you dont have XFORMERS enabled, set it to 1 AND set "use_xformers="true"" to "use_xformers="false""
If you want to process more images per round (and spend less time downloading files), change "wanted_images 500" to "wanted_images 1000" or some higher number
Do not change anything else, they are already set for you.
I will release a MUCH simpler setup soon, along with PROPER documentation
if you need help go to the private channel at https://discord.gg/NPQsdPeA
DATASET_SERVER_IP
isIP:PORT
formatmin 24GB VRAM required
running is as simple as
./run.sh
(ifchmod +x run.sh
-- if not, dobash ./run.sh
)