madebyollin/dc_ae_review.md

Last active December 8, 2024 20:02

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/madebyollin/0cb99cc86b4ee5394fac9a74e5f9aa63.js"></script>
Save madebyollin/0cb99cc86b4ee5394fac9a74e5f9aa63 to your computer and use it in GitHub Desktop.

Download ZIP

Reviewing the claims of DC-AE

Raw

dc_ae_review.md

Reviewing the Claims of DC-AE

TL;DR - I think the paper is a good contribution and basically holds up, but Figure 2 seems suspicious and the released repo doesn't include the pieces (AE training code and pretrained 4096-element AEs) that would be needed to make DC-AE practically competitive with SD/SDXL VAEs.

DC-AE is an MIT / Tsinghua / NVIDIA paper about improving generative autoencoders (like the SD VAE) under the high-spatial-compression ratio regime.

I am interested in improved autoencoders, so this gist/thread is my attempt to analyze and review some key claims from the DC-AE paper.

(Disclaimer: I work at NVIDIA in an unrelated org :) - this review is written in my personal capacity as an autoencoder buff).

Author

madebyollin commented Dec 8, 2024

To substantiate the claim of "the released DC-AE checkpoints are not yet a practical replacement for the SDXL VAE", I checked two of the pretrained DC-AE models on my "challenge set" of 5 difficult images, and verified that they're worse than the SDXL VAE (as expected due to the 2x smaller latent size).

Additionally, I evaluated the mit-han-lab_dc-ae-f64c128-in-1.0 VAE on Coco 2017 Val 256 and verified that the rFID is higher than the SD or SDXL VAEs on this dataset (first screenshot is from the SDXL paper, but I've previously verified that these numbers match the results of my eval script)