Skip to content

Instantly share code, notes, and snippets.

@joshalanwagner
Last active May 22, 2025 16:59
Show Gist options
  • Save joshalanwagner/83e82b3f3755bbd958d5d5fe195e97a9 to your computer and use it in GitHub Desktop.
Save joshalanwagner/83e82b3f3755bbd958d5d5fe195e97a9 to your computer and use it in GitHub Desktop.
a comparison between different text-to-image models.

Wan2.1 T2V 14B vs Stable Diffusion 3.5 Large vs Flux.1 Dev

Text-to-Image Models Comparison

Process explanation follows. Comfy workflows should be entact.

Carrying a Watermelon (16:9)

SD3.5L (1536x864 : 40 steps)

WitchSD-dpmpp_2m-beta-40-50_FAV

+ movie still of a young witch in swirling black robe and black hat walks along the surface of a 
- bioluminescent 
+ lake carrying a large watermelon through fog under moonlight.
She's not carrying it in a believable way in this one, but most everything else is looking good.

Wan2.1 (1920x1080 : 30 steps)

witchHD-dpmpp_2m-beta-30-50_BEST

+movie still of a young witch in swirling black robe and black hat 
-walks along the surface of a 
+bioluminescent lake carrying a large watermelon through fog under moonlight.

The moon is in front of the clouds here. Good prompt adherence though. With UniPC Wan will also do a nice animated style.

Flux (1920x1080 : 30 steps)

witchHD-dpmpp_2m-beta-30-53_BEST-SOFT

+ movie still of a young witch in swirling black robe and black hat walks along the surface of a 
+ bioluminescent lake carrying a large watermelon through fog under moonlight. 
- Soft focus

Chroma (1920x1080 : 40 steps)

WitchCU27-dpmpp_2m-beta-40-53_BEST

+ movie still of a young witch in swirling black robe and black hat walks along the surface of a 
+ bioluminescent lake carrying a large watermelon through fog under moonlight. 
- wrong grip

Chroma couldn't get the correct watermelon carry.

Lipstick Triangle (3:4)

SD3.5L (1152x1536 : 40 steps)

MagazineSD-dpmpp_2m-beta-40-53_BEST

+ Magazine advertisement. Kenyan supermodel closeup glitter eye shadow 
- small triangle of 
+ blue lipstick on 
- lower 
+ lip. Haute fashion. Dramatic lighting.  

Wan2.1 (1152x1536 : 20 steps)

MagazineWan-dpmpp_2m-beta-20-50_BEST

+ Magazine advertisement. Kenyan supermodel closeup glitter eye shadow 
- small triangle of 
+ blue lipstick on 
- lower 
+ lip. Haute fashion. Dramatic lighting. 

Wan was a bit unlucky with this prompt. It did better with an earlier prompt, but that wasn't challenging enough.

Flux (1152x1536 : 30 steps)

magazine-dpmpp_2m-beta-30-52_BEST

+ Magazine advertisement. Kenyan supermodel closeup glitter eye shadow 
- small triangle of 
+ blue lipstick on lower lip. Haute fashion. Dramatic lighting.  

Chroma (1152x1536 : 40 steps)

MagCU27-dpmpp_2m-beta-40-52_BEST

+ Magazine advertisement. Kenyan supermodel closeup glitter eye shadow 
- small triangle of 
+ blue lipstick on lower lip. Haute fashion. Dramatic lighting.  

I did get a result with a triangle on the lower lip, but it wasn't as good overall.

Torn Dress (9:16)

SD3.5L (864x1536 : 40 steps)

EgyptQueenHD-dpmpp_2m-beta-40-52_BEST

+ Ancient Egyptian Queen 
- in her royal dressing room dismayed to discover her dress is torn. 

Stable diffusion really struggled with this prompt.

Wan2.1 (1080x1920) : 30 steps)

EgyptQueenHD-dpmpp_2m-beta-30-51_BEST

+ Ancient Egyptian Queen in her royal dressing room 
- dismayed 
+  to discover her dress is torn. 

She does look like she could be discovering her tear in the mirror. Other options looked more dismayed but too caucasian.

Flux (864x1536 : 30 steps)

EgyptQueenHD-dpmpp_2m-beta-30-51_BEST

+ Ancient Egyptian Queen in her royal dressing room  dismayed to discover
- her dress is torn. 

Flux can't do torn clothing it seems.

Chroma (864x1536 : 30 steps)

QueenCU27-dpmpp_2m-beta-30-52_BEST

+ Ancient Egyptian Queen in her royal dressing room  dismayed to discover her dress is torn. 

Winner!

Dog on Car (1:1)

SD3.5L (1216x1216 : 30 steps)

Hound-dpmpp_2m-sgm_uniform-30-52_BEST

+ Gritty comic book graphic novel style. 
+ feral hound sitting on the hood of a rusted-out muscle car in post-apocalyptic ruins of a small town.

nailed it.

Wan2.1 (1408x1408 : 20 steps)

Hound-dpmpp_2m-beta-20-51_00001_

+ Gritty comic book graphic novel style. 
- feral hound 
+ sitting on the hood of a rusted-out muscle car in post-apocalyptic ruins of a small town.

dog has a tag. not feral.

Flux (1408x1408 : 30 steps)

Hound-dpmpp_2m-beta-30-51_BEST

+ Gritty comic book graphic novel style. 
+ feral hound sitting on the hood of a rusted-out muscle car in  
- post-apocalyptic ruins 
+ of a small town.

All the flux results had the hound correctly placed, but the style is a mix of photographic and graphic novel with the chrome looking too realistic.

Chroma (1408x1408 : 20 steps)

HoundChrm-euler-beta-20-53_BEST

+ Gritty comic book graphic novel style. feral hound sitting on the hood of a 
+ rusted-out muscle car in post-apocalyptic ruins of a small town.

Alien Plant (3:4)

SD3.5L (1024x1280 : 40 steps)

WindowSD-dpmpp_2m-beta-40-51_BEST

- large-format photograph. 
+ looking through a dusty broken-out window, 
- with a spider web in the upper-left corner. 
+ A giant alien weed grows casting a dark shadow, cross between a thistle and a raspberry in the 
+ overgrown backyard of an abandoned house enclosed by an old weathered fence. creepy mood.

Wan2.1 (1280x1600 : 20 steps)

WindowWan-dpmpp_2m-beta-20-50_BEST

+ large-format photograph. looking through a dusty broken-out  window, with a spider web in the upper-left corner. 
- A giant alien weed grows casting a dark shadow, cross between a thistle and 
+ a raspberry in the  overgrown backyard of an abandoned house enclosed by an old weathered fence. creepy mood.

Flux (1280x1600 : 30 steps)

WindowFlux-dpmpp_2m-beta-30-53_BEST

large-format photograph. looking through a dusty broken-out 
+ window, with a spider web in the upper-left corner. 
- A giant 
+ alien weed grows 
- casting a dark shadow, cross between a thistle and a raspberry in the 
- overgrown 
+ backyard of an abandoned house enclosed by an old weathered fence. creepy mood.

Chroma (1152x1440 : 30 steps)

WindowCU27-dpmpp_2m-beta-30-51_BEST

large-format photograph. 
+ looking through a dusty broken-out window, 
- with a spider web in the upper-left corner. 
- A giant alien weed grows 
+ casting a dark shadow, cross between a thistle and a raspberry in the overgrown 
+ backyard of an abandoned house enclosed by an old weathered fence. creepy mood.

Typography (2:3)

SD3.5L (960x1440 : 30 steps)

Invitation-dpmpp_2m-sgm_uniform-30-52_BEST

+ Invitation for a baby shower. Generic clipart image of a stork carrying a baby. 
+ Headline "We are expecting you!" and smaller text that says "Feb 24 at the Grizwalt Hotel" and "Games and Prizes!"
+ Graphic design in pastel colors. 

Type treatment is not awesome, but got most of the letters right!

Wan2.1 (1152x1728 : 20 steps)

invitation-dpmpp_2m-beta-20-50_BEST

+ Invitation for a baby shower. Generic clipart image of a stork carrying a baby. 
- Headline "We are expecting you!" and smaller text that says "Feb 24 at the Grizwalt Hotel" and "Games and Prizes!"
+ Graphic design in pastel colors. 

Flux (960x1440 : 30 steps)

invitationFlux-dpmpp_2m-beta-30-50_BEST

+ Invitation for a baby shower. Generic clipart image of a stork carrying a baby. 
+ Headline "We are expecting you!" 
- and smaller text that says "Feb 24 at the Grizwalt Hotel" 
- and "Games and Prizes!"
+ Graphic design in pastel colors. 

Flux didn't do well at the larger size.

Chroma (1024x1356 : 30 steps)

InvitationChrm-euler-beta-30-52_BEST

+ Invitation for a baby shower. 
Generic clipart image of a stork carrying a baby. 
+ Headline "We are expecting you!" and smaller text that says "Feb 24 at the Grizwalt Hotel" 
- and "Games and Prizes!"
+ Graphic design in pastel colors. 

UI Design (8:5)

SD3.5L (1536x960 : 40 steps)

UI-dpmpp_2m-beta-40-52_BEST

+ user interface of a sleek modern video editing software application. 
+ It has a timeline on top for video clips. 
+ a preview in the middle. 
+ a clips browser on the left. 
+ a panel for adding effects on the right. UI design.

Doesn't look the best, but it did follow instructions.

Wan (1536x960 : 20 steps)

UI-dpmpp_2m-beta-20-52_BEST

+ user interface of a sleek modern video editing software application. 
- It has a timeline on top for video clips. 
a preview in the middle. 
a clips browser on the left. 
+ a panel for adding effects on the right. UI design.

Wan seems influenced by DaVinci Resolve.

Flux (1536x960 : 30 steps)

UIFlux-dpmpp_2m-beta-30-53_BEST

+ user interface of a sleek modern video editing software application. 
- It has a timeline on top for video clips. 
a preview in the middle. 
+ a clips browser on the left. 
a panel for adding effects on the right. UI design.

Flux seems influenced a lot by iMovie.

Chroma (1536x960 : 30 steps)

UI_CU27-dpmpp_2m-beta-30-51_BEST

+ user interface of a sleek modern video editing software application. 
- It has a timeline on top for video clips. 
a preview in the middle. 
+ a clips browser on the left. 
+ a panel for adding effects on the right. UI design.

Notes

Inference speeds:

  • SD Fastest
  • Flux Medium
  • Chroma Medium
  • Wan Slowest

I did four iterations of each prompt: seeds 50-53. I chose my favorite for prompt-adherence and aesthetics. I tried the recommended UNIPC/Simple for Wan, but ended up using DPM++2m/beta for better results. I did my best to write prompts that didn't advantage any particular model.
I focused on challenging concepts, trying to find the edges of where things break to reveal where models excell at comprehension and prompt adherence. Scheduler/Sampler combination had a large impact on the nature of some of the results. It seems possible that UniPC works better for Wan video and worse for Wan images? These are all rendered at large resolutions cause I often need higher resolutions. I've learned that the resolution ceiling is a soft ceiling, where more iterations can sometimes make higher resolutions work.

Bonus - Chinchilla in a Diner!

SD3.5

ChinchillaSD-dpmpp_2m-beta-40-50_BEST 10/10

Wan2.1

ChinchillaHD-uni_pc-simple-40-54_BEST 8/10

Flux

ChinchillaFlux-dpmpp_2m-beta-30-50_Best 7/10 - (rabbit)

Chroma

ChinchCU27-dpmpp_2m-beta-40-53_BEST 8/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment