Last active
July 29, 2023 21:40
-
-
Save behelit2/a736cf1b725a1f9275e53c4b38fa5dc9 to your computer and use it in GitHub Desktop.
A Python 3 and Shell Script solution for running DeepFloyd IF on the Nvidia Tesla K80
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from diffusers import DiffusionPipeline | |
from diffusers.utils import pt_to_pil | |
import torch, sys, random, gc | |
# A function to free up memory. | |
def flush(): | |
gc.collect() | |
torch.cuda.empty_cache() | |
# Pass image prompts from the command line in quotes "". | |
prompt = sys.argv[1] | |
# Load the first stage of DeepFloyd IF with fp16 precision and enable CPU offload. | |
stage_1 = DiffusionPipeline.from_pretrained("../huggingface/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16) | |
stage_1.enable_model_cpu_offload(0) | |
# Load the second stage of DeepFloyd IF with fp16 precision and enable CPU offload. | |
stage_2 = DiffusionPipeline.from_pretrained("../huggingface/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16) | |
stage_2.enable_model_cpu_offload(0) | |
# Encode prompt and get the embeddings. | |
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt) | |
# Set a random seed for the generator. | |
seed = random.randint(0, 999999999) | |
generator = torch.manual_seed(seed) | |
# Run the first stage of the diffusion model and save the image as a PNG file in the small_images folder | |
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images | |
pt_to_pil(image)[0].save("../small_images/" + prompt + "_" + str(seed) + "_S1_gpu1.png") | |
print("Stage 1 complete.") | |
# Delete the first stage model and flush the memory | |
del stage_1 | |
flush() | |
# Run the second stage of the diffusion model and save the image as a PNG file in the images folder | |
image = stage_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images | |
pt_to_pil(image)[0].save("../images/" + prompt + "_" + str(seed) + "_S2_gpu1.png") | |
print("Stage 2 complete.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from diffusers import DiffusionPipeline | |
from diffusers.utils import pt_to_pil | |
import torch, sys, random, gc | |
# A function to free up memory. | |
def flush(): | |
gc.collect() | |
torch.cuda.empty_cache() | |
# Pass image prompts from the command line in quotes "". | |
prompt = sys.argv[1] | |
# Load the first stage of DeepFloyd IF with fp16 precision and enable CPU offload. | |
stage_1 = DiffusionPipeline.from_pretrained("../huggingface/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16) | |
stage_1.enable_model_cpu_offload(1) | |
# Load the second stage of DeepFloyd IF with fp16 precision and enable CPU offload. | |
stage_2 = DiffusionPipeline.from_pretrained("../huggingface/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16) | |
stage_2.enable_model_cpu_offload(1) | |
# Encode prompt and get the embeddings. | |
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt) | |
# Set a random seed for the generator. | |
seed = random.randint(0, 999999999) | |
generator = torch.manual_seed(seed) | |
# Run the first stage of the diffusion model and save the image as a PNG file in the small_images folder | |
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images | |
pt_to_pil(image)[0].save("../small_images/" + prompt + "_" + str(seed) + "_S1_gpu2.png") | |
print("Stage 1 complete.") | |
# Delete the first stage model and flush the memory | |
del stage_1 | |
flush() | |
# Run the second stage of the diffusion model and save the image as a PNG file in the images folder | |
image = stage_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images | |
pt_to_pil(image)[0].save("../images/" + prompt + "_" + str(seed) + "_S2_gpu2.png") | |
print("Stage 2 complete.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
while true | |
do | |
"$@" | |
sleep 1 | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Instructions: The authors recommend 14gb vram for running the largest 1st (IF-I-XL-v1.0) and 2nd (IF-II-L-v1.0) stage models, and 16gb vram for the temporary 3rd stage upscaler. However, the upscaler is not overly effective in my opinion and can be swapped with other solutions like Upscayl. The most important parts are the 1st and 2nd stage. After a little tinkering, I was able to get generation working with just 12gb of vram, and without any 8bit hacks. There is almost exactly enough vram to pull this off, but looping inside the script will cause an out of memory error. If you simply re-run the script, it works fine. To that end, I attached a loop.sh script that can be used to loop the Python scripts. This works without error when only running the script on GPU1 or GPU2. It also works fine running both the K80 GPUs with a small caveat. Both GPUs store a bit of data in GPU1, and this will occasionally cause vram to go over the limit during the model loading phase, which produces nasty (but harmless) out of memory errors on GPU1. The loop.sh script will simply try again, and usually succeeds on the second attempt when the memory is freed. This only happens on GPU1 with both GPUs running, and only during the model loading phase. So it's mostly inconsequential and doesn't affect the output rate much, no generation time is wasted. A small price to pay for making the K80 do something it has no business doing.
Example usage:
./loop.sh python3 deepfloyd-IF_K80_12gb_gpu1.py "A photorealistic picture of blue cats on the sidewalk."