- The paper presents Deep Convolutional Generative Adversarial Nets (DCGAN) - a topologically constrained variant of conditional GAN.
- Link to the paper
- Stable to train
- Very useful to learn unsupervised image representations.
- GANs difficult to scale using CNNs.
- Paper proposes following changes to GANs:
- Replace any pooling layers with strided convolutions (for discriminator) and fractional strided convolutions (for generators).
- Remove fully connected hidden layers.
- Use batch normalisation in both generator (all layers except output layer) and discriminator (all layers except input layer).
- Use LeakyReLU in all layers of the discriminator.
- Use ReLU activation in all layers of the generator (except output layer which uses Tanh).
- Large-Scale Scene Understanding.
- Imagenet-1K.
- Faces dataset.
- Minibatch SGD with minibatch size of 128.
- Weights initialized with 0 centered Normal distribution with standard deviation = 0.02
- Adam Optimizer
- Slope of leak = 0.2 for LeakyReLU.
- Learning rate = 0.0002, β1 = 0.5
- Large-Scale Scene Understanding data
- Demonstrates that model scales with more data and higher resolution generation.
- Even though it is unlikely that model would have memorized images (due to low learning rate of minibatch SGD).
- Classifying CIFAR-10 dataset
- Features
- Train in Imagenet-1K and test on CIFAR-10.
- Max pool discriminator's convolutional features (from all layers) to get 4x4 spatial grids.
- Flatten and concatenate to get a 28672-dimensional vector.
- Linear L2-SVM classifier trained over the feature vector.
- 82.8% accuracy, outperforms K-means (80.6%)
- Features
- Street View House Number Classifier
- Similar pipeline as CIFAR-10
- 22.48% test error.
- The paper contains many examples of images generated by final and intermediate layers of the network.
- Images in the latent space do not show sharp transitions indicating that network did not memorize images.
- DCGAN can learn an interesting hierarchy of features.
- Networks seems to have some success in disentangling image representation from object representation.
- Vector arithmetic can be performed on the Z vectors corresponding to the face samples to get results like
smiling woman - normal woman + normal man = smiling man
visually.
i want to find a tutorial with gan and unsupervised learning in python please can you help me