Skip to content

Instantly share code, notes, and snippets.

@madebyollin
Last active December 8, 2024 20:02
Show Gist options
  • Save madebyollin/0cb99cc86b4ee5394fac9a74e5f9aa63 to your computer and use it in GitHub Desktop.
Save madebyollin/0cb99cc86b4ee5394fac9a74e5f9aa63 to your computer and use it in GitHub Desktop.
Reviewing the claims of DC-AE

Reviewing the Claims of DC-AE

TL;DR - I think the paper is a good contribution and basically holds up, but Figure 2 seems suspicious and the released repo doesn't include the pieces (AE training code and pretrained 4096-element AEs) that would be needed to make DC-AE practically competitive with SD/SDXL VAEs.


DC-AE is an MIT / Tsinghua / NVIDIA paper about improving generative autoencoders (like the SD VAE) under the high-spatial-compression ratio regime.

I am interested in improved autoencoders, so this gist/thread is my attempt to analyze and review some key claims from the DC-AE paper.

(Disclaimer: I work at NVIDIA in an unrelated org :) - this review is written in my personal capacity as an autoencoder buff).

@madebyollin
Copy link
Author

To substantiate the claim of "the released DC-AE checkpoints are not yet a practical replacement for the SDXL VAE", I checked two of the pretrained DC-AE models on my "challenge set" of 5 difficult images, and verified that they're worse than the SDXL VAE (as expected due to the 2x smaller latent size).
Unknown-29
Unknown-27

Additionally, I evaluated the mit-han-lab_dc-ae-f64c128-in-1.0 VAE on Coco 2017 Val 256 and verified that the rFID is higher than the SD or SDXL VAEs on this dataset (first screenshot is from the SDXL paper, but I've previously verified that these numbers match the results of my eval script)
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment