- Conditional version of Generative Adversarial Nets (GAN) where both generator and discriminator are conditioned on some data y (class label or data from some other modality).
- Link to the paper
- Feed y into both the generator and discriminator as additional input layers such that y and input are combined in a joint hidden representation.
- Conditioning MNIST images on class labels.
- z (random noise) and y mapped to hidden layers with ReLu with layer sizes of 200 and 1000 respectively and are combined to obtain ReLu layer of dimensionality 1200.
- Discriminator maps x (input) and y to maxout layers and the joint maxout layer is fed to sigmoid layer.
- Results do not outperform the state-of-the-art results but do provide a proof-of-the-concept.
- Map images (from Flickr) to labels (or user tags) to obtain the one-to-many mapping.
- Extract image and text features using convolutional and language model.
- Generative Model
- Map noise and convolutional features to a single 200 dimensional representation.
- Discriminator Model
- Combine the representation of word vectors (corresponding to tags) and images.
- While the results are not so good, they do show the potential of Conditional GANs, especially in the multimodal setting.