This guide is based on another more comprehensive one that can be found at All of the necessary environment has been encapsulated into a docker image that can easily run on different cloud providers. (disclaimer: there seems to be a version mismatch with xformers, so we leave it off because it won't work)
Note: the docker image used for this exercise is based on Stable Diffusion 1.5 release and has the updated VAEs. By using the model contained in this image or any fine-tuned models derived from it, you agree to terms of use.
- Make sure you have an account with and some credit.
- Navigate to and filter by A100 GPU type (others may work, but probably slower)
- Click the "Edit Image Config" button to customize the image
- In the popup, select jupyter-python notebook and check "Use Jupyter Lab". For the image enter
- Select and Save
- Rent a machine with desired number of GPUs. For this tutorial I've selected a machine with 2 PCI A100 cards.
- Navigate to and you should see your instance starting up. This will take a few minutes, as the docker image is about 17GB and has to download onto the instance before you can connect. (It also builds a new custom image each time)
- Once the instance has finished loading the image, you should see an "Open" button appear on the right.
- Click on the Open button to launch the Jupyter Lab interface in another browser tab. It should look something like this:
- Before fine-tuning you will need add training data. For this tutorial, I used the dataset that was used to train Pokemon Diffusers. I converted the dataset to the expected format, where each image file (.jpg) is paired with a caption file (.caption, plain text file). You can find a zip file of the dataset here.
- This zip file is pretty big so it's easiest to download it to the instance. Open up a terminal from the launcher and run:
cd /
wget && unzip && mv pokemon train_data
- Activate the python virtual environment
source venv_diffusers/bin/activate
- Configure accelerate. This is the point where you can configure accelerate to use multiple GPUs. In this case I have 2 so I will configure it to use multi-gpu.
/$ accelerate config
In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-CPU, [2] multi-GPU, [3] TPU [4] MPS): 2
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Do you want to use DeepSpeed? [yes/NO]:
Do you want to use FullyShardedDataParallel? [yes/NO]:
Do you want to use Megatron-LM ? [yes/NO]:
How many GPU(s) should be used for distributed training? [1]:2
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: bf16
- Merge captions into metadata file
python train_data meta_cap.json
- Convert to latents and bucket by resolution
python train_data meta_cap.json meta_lat.json model.ckpt --batch_size 4 --max_resolution 512,512 --mixed_precision no
- Launch the training script. This is the part that takes a while. Feel free to tweak parameters as you see fit.
accelerate launch --num_cpu_threads_per_process 8 --pretrained_model_name_or_path=model.ckpt --in_json meta_lat.json --train_data_dir=train_data --output_dir=fine_tuned --shuffle_caption --train_batch_size=1 --learning_rate=5e-6 --max_train_steps=10000 --use_8bit_adam --gradient_checkpointing --mixed_precision=bf16 --save_every_n_epochs=4
- Once completed, checkpoint files are saved in
folder. To make these visible in the Jupyter Lab file interface, run the following:
mv fine_tuned /root/
- Now you can use the file manager to navigate to the
folder and download the new fine-tuned checkpoint files. The one with the most training will belast.ckpt
, with other earlier checkpoints named by incremental epoch number.