Skip to content

Instantly share code, notes, and snippets.

@u1i
Created September 2, 2025 00:28
Show Gist options
  • Save u1i/8d6339fff2be86856f44e7657c59bf65 to your computer and use it in GitHub Desktop.
Save u1i/8d6339fff2be86856f44e7657c59bf65 to your computer and use it in GitHub Desktop.
Qwen Image Models on Lambda AI

Qwen Image Models Setup on Lambda Labs GPU

Complete step-by-step guide to set up and run both Qwen-Image (text-to-image) and Qwen-Image-Edit (image editing) on Lambda Labs cloud GPU instances.

Prerequisites

  • Lambda Labs account
  • SSH key pair generated

Step 1: Instance Setup

1.1 Choose GPU Instance

  • Recommended: 1x H100 (80 GB PCIe) - $2.49/hr
  • Alternative: 1x A100 (40 GB SXM4) - $1.29/hr
  • Base Image: Lambda Stack 22.04 (includes CUDA, PyTorch pre-installed)

1.2 Launch Instance

  1. Select H100 instance type
  2. Choose "Lambda Stack 22.04" as base image
  3. Add your SSH key
  4. Launch instance

Step 2: Connect and Verify

2.1 SSH Connection

ssh ubuntu@<your-instance-ip>

2.2 Verify GPU

nvidia-smi

Expected: H100 with 81GB VRAM available

2.3 Check Python

python3 --version

Expected: Python 3.10.12

Step 3: Environment Setup

3.1 Create Virtual Environment

python3 -m venv qwen_env
source qwen_env/bin/activate

3.2 Install Dependencies

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install core libraries
pip install transformers
pip install git+https://github.com/huggingface/diffusers
pip install pillow
pip install accelerate

3.3 Verify CUDA Setup

python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}'); print(f'GPU count: {torch.cuda.device_count()}')"

Expected output:

CUDA available: True
CUDA version: 12.1
GPU count: 1

Step 4: Model Overview

4.1 Qwen-Image (Text-to-Image)

  • Purpose: Generate images from text prompts
  • Input: Text prompt only
  • Output: New image

4.2 Qwen-Image-Edit (Image Editing)

  • Purpose: Edit existing images with text instructions
  • Input: Existing image + text prompt
  • Output: Modified image

Step 5: Usage Examples

5.1 Qwen-Image: Text-to-Image Generation

Fast Generation (8 steps, ~6 seconds)

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt, num_inference_steps=8).images[0]
image.save("generated_image.png")
print("Generated image saved as generated_image.png")

High Quality Generation (20 steps, ~15 seconds)

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(
    prompt, 
    num_inference_steps=20, 
    guidance_scale=7.5,
    negative_prompt="blurry, low quality, distorted"
).images[0]
image.save("hq_generated_image.png")
print("High quality generated image saved")

5.2 Qwen-Image-Edit: Image Editing

Fast Editing (8 steps, ~6 seconds)

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(image=input_image, prompt=prompt, num_inference_steps=8).images[0]
image.save("fast_output.png")
print("Fast image saved as fast_output.png")

4.2 High Quality Inference (20 steps, ~15 seconds)

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(image=input_image, prompt=prompt, num_inference_steps=20, guidance_scale=7.5).images[0]
image.save("quality_output.png")
print("High quality image saved as quality_output.png")

4.3 Maximum Quality (50 steps, ~2 minutes)

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "Turn this cat into a dog"
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(
    image=input_image, 
    prompt=prompt, 
    num_inference_steps=50, 
    guidance_scale=7.5,
    negative_prompt="blurry, low quality, distorted"
).images[0]
image.save("max_quality_output.png")
print("Maximum quality image saved as max_quality_output.png")

Performance Notes

  • H100 (80GB VRAM): Handles full model without quantization
  • Model size: ~60GB download
  • Inference speed:
    • 8 steps: ~6 seconds
    • 20 steps: ~15 seconds
    • 50 steps: ~2 minutes

Troubleshooting

Common Issues

  1. CUDA not available: Verify PyTorch CUDA installation
  2. Out of memory: Reduce inference steps or use quantization
  3. Slow loading: Install accelerate for faster model loading

Memory Optimization (if needed)

pip install bitsandbytes

Then use quantized loading:

from diffusers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image-Edit", 
    quantization_config=quantization_config,
    torch_dtype=torch.bfloat16
)

Cost Optimization

  • H100: $2.49/hr - Best performance
  • A100: $1.29/hr - Good performance, less VRAM
  • Remember to terminate instance when done to avoid charges

Next Steps

  1. Test with your own images
  2. Experiment with different prompts
  3. Adjust inference steps based on speed/quality needs
  4. Consider setting up automatic shutdown to control costs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment