Skip to content

Instantly share code, notes, and snippets.

View drdaxxy's full-sized avatar

Niklas K. drdaxxy

  • Hannover, Lower Saxony, Germany
View GitHub Profile

Typical approaches to training, and sampling from, denoising diffusion models yield results whose per-item means match the initial input - i.e. zero when using i.i.d. samples from a standard normal distribution. This has major implications for what outputs can be obtained from popular text-to-image generative models, see e.g. https://twitter.com/apeoffire/status/1624884816851206145 and https://www.crosslabs.org/blog/diffusion-with-offset-noise.

It also means we can reliably produce dark, bright, or tinted images by shifting the input to a desired color.

Now, I was curious what would happen if I made Stable Diffusion denoise an "impossible" image whose mean color exceeds the [0,1] valid RGB range:

init_latent = vae_encode(tensor([1.5, 1.5, 1.5])[None,:,None,None].tile(1,1,512,512)) + sigma_max * randn(1,4,64,64)