Skip to content

Instantly share code, notes, and snippets.

@prakashjayy
Last active September 16, 2024 06:55
Show Gist options
  • Save prakashjayy/8c31176693c1d49386f95a959fcfc44c to your computer and use it in GitHub Desktop.
Save prakashjayy/8c31176693c1d49386f95a959fcfc44c to your computer and use it in GitHub Desktop.
GAN papers list

These are the list of papers concepts you should learn to become a PRO in GAN Research. We will see one as a step by step improvment in generation quality, stabilized training etc.

  1. The first paper on GAN by IAN GoodFellow and team. There are many blogs on how a GAN works based on this paper. A simple search on google should give u many results. Asking an LLM is far more better.
  2. GANs more susceptible to mode-collapse, vanishing gradients, so people moved away from simple BCE loss to Earth mover distance and some kind of regalurization to avoid these. Read WGAN and WGAN-GP papers to understand this again.
  3. Next up is ProGAN. this is the first paper to my knowledge which basically made GANs possible to generate high resolution images. though progressive training is out of fashion, I still consider one should read about this [https://arxiv.org/pdf/1710.10196v3]. Their another key contribution is equalized learning rates which means to normalize the output of layers by a constant c. Added a pixelwise normalization vector to each output of conv so that they are unit normalized.
  4. StyleGAN base and its varients. I would recommend to use styleganv2 if u have large data accessible and styleganv2-ada if u have less data. StyleGAN extend ProGAN and introduces some key features like mapping network, style block, nosie layers, PPL regularizers, diff augument etc. Ada is simply introducing different transforms and applying it both to generator and discrimintor so that transfoms are not leaked into generator (we don't want generator to produce a human face inverted to 90 degree)
  5. GigaGAN: conditioned on text, this is the first paper to show GANs working at scale and prompting using text. U can even generate 4k resolution images.

Training

  • No matter which network I've trained, it has always been crucial for me to compute gradients and clip them. While clipping, I calculate a layer L2 norm and ensure it remains below my desired values. For the generator, this is 0.1, and for the discriminator, 0.05. This approach extends the training time, but I've consistently found that loss curves smoothen out nicely and decrease over time. Without this technique, I was unable to train even StyleGAN v2 architectures.

Key terminology

  • Perpetual path length : introduced in styleganv2, take two vectors in input space and the steps taken along the vector should generate minimal changes.
  • Spectral normalization : take sq root of highest eigen value of the matrix and divide it with the matrix itself. Specifically used in Discrimintors.
  • FID score.
  • Inception score.
  • LPIPS.

Courses and books

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment