Complete step-by-step guide to set up and run both Qwen-Image (text-to-image) and Qwen-Image-Edit (image editing) on Lambda Labs cloud GPU instances.
- Lambda Labs account
- SSH key pair generated
- Recommended: 1x H100 (80 GB PCIe) - $2.49/hr
- Alternative: 1x A100 (40 GB SXM4) - $1.29/hr
- Base Image: Lambda Stack 22.04 (includes CUDA, PyTorch pre-installed)
- Select H100 instance type
- Choose "Lambda Stack 22.04" as base image
- Add your SSH key
- Launch instance
ssh ubuntu@<your-instance-ip>
nvidia-smi
Expected: H100 with 81GB VRAM available
python3 --version
Expected: Python 3.10.12
python3 -m venv qwen_env
source qwen_env/bin/activate
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install core libraries
pip install transformers
pip install git+https://github.com/huggingface/diffusers
pip install pillow
pip install accelerate
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}'); print(f'GPU count: {torch.cuda.device_count()}')"
Expected output:
CUDA available: True
CUDA version: 12.1
GPU count: 1
- Purpose: Generate images from text prompts
- Input: Text prompt only
- Output: New image
- Purpose: Edit existing images with text instructions
- Input: Existing image + text prompt
- Output: Modified image
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt, num_inference_steps=8).images[0]
image.save("generated_image.png")
print("Generated image saved as generated_image.png")
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(
prompt,
num_inference_steps=20,
guidance_scale=7.5,
negative_prompt="blurry, low quality, distorted"
).images[0]
image.save("hq_generated_image.png")
print("High quality generated image saved")
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
image = pipe(image=input_image, prompt=prompt, num_inference_steps=8).images[0]
image.save("fast_output.png")
print("Fast image saved as fast_output.png")
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
image = pipe(image=input_image, prompt=prompt, num_inference_steps=20, guidance_scale=7.5).images[0]
image.save("quality_output.png")
print("High quality image saved as quality_output.png")
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
image = pipe(
image=input_image,
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5,
negative_prompt="blurry, low quality, distorted"
).images[0]
image.save("max_quality_output.png")
print("Maximum quality image saved as max_quality_output.png")
- H100 (80GB VRAM): Handles full model without quantization
- Model size: ~60GB download
- Inference speed:
- 8 steps: ~6 seconds
- 20 steps: ~15 seconds
- 50 steps: ~2 minutes
- CUDA not available: Verify PyTorch CUDA installation
- Out of memory: Reduce inference steps or use quantization
- Slow loading: Install
accelerate
for faster model loading
pip install bitsandbytes
Then use quantized loading:
from diffusers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
pipe = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image-Edit",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16
)
- H100: $2.49/hr - Best performance
- A100: $1.29/hr - Good performance, less VRAM
- Remember to terminate instance when done to avoid charges
- Test with your own images
- Experiment with different prompts
- Adjust inference steps based on speed/quality needs
- Consider setting up automatic shutdown to control costs