Skip to content

Instantly share code, notes, and snippets.

@richbai90
Created April 10, 2025 17:21
Show Gist options
  • Save richbai90/2c6eaf73596e3f10de605159056f8ff0 to your computer and use it in GitHub Desktop.
Save richbai90/2c6eaf73596e3f10de605159056f8ff0 to your computer and use it in GitHub Desktop.

PyTorch-CycleGAN-and-pix2pix: Training Guide

This guide covers how to use the PyTorch-CycleGAN-and-pix2pix repository for image-to-image translation tasks, including:

  • Command line options for training and testing
  • Dataset preparation
  • Custom dataset implementation

Command Line Options

The repository uses a nested structure of options divided between base options (shared by both training and testing) and specific options for each mode. Let's explore these options in detail.

Base Options

These options are shared between training and testing:

Option Type Default Description
--dataroot string required Path to the dataset
--name string experiment_name Name of the experiment (used for saving results and models)
--gpu_ids list 0 IDs of GPUs to use (e.g., 0,1,2 for multi-GPU)
--checkpoints_dir string ./checkpoints Directory to save model checkpoints
--model string cycle_gan Model type: cycle_gan, pix2pix, test, colorization, etc.
--input_nc int 3 Number of input image channels
--output_nc int 3 Number of output image channels
--ngf int 64 Number of generator filters in the last conv layer
--ndf int 64 Number of discriminator filters in the first conv layer
--netD string basic Discriminator architecture: basic, n_layers, pixel
--netG string resnet_9blocks Generator architecture: resnet_9blocks, resnet_6blocks, unet_256, unet_128
--n_layers_D int 3 Number of layers in discriminator for n_layers architecture
--norm string instance Normalization type: instance, batch, none
--init_type string normal Initialization type: normal, xavier, kaiming, orthogonal
--init_gain float 0.02 Scaling factor for initialization
--no_dropout bool False No dropout for the generator
--dataset_mode string unaligned Dataset mode: unaligned (CycleGAN), aligned (pix2pix), single, colorization
--direction string AtoB Data direction for translation: AtoB or BtoA
--serial_batches bool False If true, data is loaded in the same order each time (useful for debugging)
--num_threads int 4 Number of threads for data loading
--batch_size int 1 Input batch size
--load_size int 286 Scale images to this size before cropping to crop_size
--crop_size int 256 Crop images to this size
--max_dataset_size int inf Maximum number of samples to load from the dataset
--preprocess string resize_and_crop Preprocessing type: resize_and_crop, crop, scale_width, scale_width_and_crop, none
--no_flip bool False If specified, do not flip the images
--display_winsize int 256 Display window size for visdom
--epoch string latest The epoch to load for testing: latest, best, or a specific number
--load_iter int 0 Iteration to load from when using --continue_train
--verbose bool False If specified, print more debugging information
--suffix string none Customized suffix for model names
--use_wandb bool False Use Weights & Biases for logging

Training Options

Additional options available during training:

Option Type Default Description
--display_freq int 400 Frequency to show training results on screen
--display_ncols int 4 Number of images per row in the visualizer
--display_id int 1 Window ID for the visdom display, set to 0 to disable visdom
--display_server string http://localhost Visdom server address
--display_env string main Visdom environment name
--display_port int 8097 Visdom port
--update_html_freq int 1000 Frequency to save training results to HTML
--print_freq int 100 Frequency to print training losses
--no_html bool False Do not save intermediate training results to web
--save_latest_freq int 5000 Frequency to save the latest model
--save_epoch_freq int 5 Frequency to save checkpoints at the end of epochs
--save_by_iter bool False Save models by iteration instead of by epoch
--continue_train bool False Continue training from the last checkpoint
--epoch_count int 1 Starting epoch count
--n_epochs int 100 Number of epochs with initial learning rate
--n_epochs_decay int 100 Number of epochs with linearly decaying learning rate
--phase string train Training phase: train or test
--beta1 float 0.5 Momentum term for adam optimizer
--lr float 0.0002 Initial learning rate
--gan_mode string lsgan GAN loss mode: vanilla, lsgan, or wgangp
--pool_size int 50 Size of image buffer holding generated images (prevents oscillation)
--lr_policy string linear Learning rate policy: linear, step, plateau, cosine
--lr_decay_iters int 50 Iterations for step learning rate decay

Testing Options

Additional options available during testing:

Option Type Default Description
--phase string test Testing phase
--eval bool False Use evaluation mode during test time
--num_test int 50 Number of test images
--aspect_ratio float 1.0 Aspect ratio of result images
--results_dir string ./results/ Directory to save test results

Model-Specific Options

CycleGAN Options

Option Type Default Description
--lambda_A float 10.0 Weight for cycle loss (A -> B -> A)
--lambda_B float 10.0 Weight for cycle loss (B -> A -> B)
--lambda_identity float 0.5 Weight for identity mapping loss

pix2pix Options

Option Type Default Description
--lambda_L1 float 100.0 Weight for L1 loss

Training Commands

Training CycleGAN

python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

Training pix2pix

python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA

Testing Commands

Testing CycleGAN

python test.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

Testing pix2pix

python test.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA

Apply a Pre-trained Model

# Download a pre-trained model
bash ./scripts/download_cyclegan_model.sh horse2zebra

# Test the model
python test.py --dataroot datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout

Preparing Custom Datasets

The repository supports various dataset formats. Here's how to prepare them:

Dataset Structure

Aligned Datasets (for pix2pix)

For paired image-to-image translation (pix2pix), the dataset should be structured as follows:

custom_dataset/
├── train/
│   └── [paired images: input_output.jpg]
├── val/
│   └── [paired images: input_output.jpg]
└── test/
    └── [paired images: input_output.jpg]

Each image should contain both the input and output side-by-side. You can use the script scripts/combine_A_and_B.py to combine images from two folders into paired images.

python scripts/combine_A_and_B.py --fold_A /path/to/input_images --fold_B /path/to/output_images --fold_AB /path/to/output_directory

Unaligned Datasets (for CycleGAN)

For unpaired image-to-image translation (CycleGAN), organize your dataset as follows:

custom_dataset/
├── trainA/
│   └── [images from domain A]
├── trainB/
│   └── [images from domain B]
├── testA/
│   └── [images from domain A for testing]
└── testB/
    └── [images from domain B for testing]

Dataset Downloading

The repository provides scripts to download various datasets:

# Download CycleGAN datasets
bash ./datasets/download_cyclegan_dataset.sh [dataset_name]

# Download pix2pix datasets
bash ./datasets/download_pix2pix_dataset.sh [dataset_name]

Available dataset names for CycleGAN: apple2orange, summer2winter_yosemite, horse2zebra, monet2photo, cezanne2photo, ukiyoe2photo, vangogh2photo, maps, cityscapes, facades, iphone2dslr_flower.

Available dataset names for pix2pix: facades, cityscapes, maps, edges2shoes, edges2handbags.

Creating a Custom Dataset

To implement a custom dataset:

  1. Create a Dataset Class:

    Create a new file in the data directory, for example, custom_dataset.py:

    import os
    from data.base_dataset import BaseDataset, get_transform
    from data.image_folder import make_dataset
    from PIL import Image
    
    class CustomDataset(BaseDataset):
        def __init__(self, opt):
            BaseDataset.__init__(self, opt)
            self.dir_A = os.path.join(opt.dataroot, 'trainA')  # create path for domain A
            self.dir_B = os.path.join(opt.dataroot, 'trainB')  # create path for domain B
            
            self.A_paths = sorted(make_dataset(self.dir_A, opt.max_dataset_size))
            self.B_paths = sorted(make_dataset(self.dir_B, opt.max_dataset_size))
            self.A_size = len(self.A_paths)
            self.B_size = len(self.B_paths)
            
            self.transform_A = get_transform(opt)
            self.transform_B = get_transform(opt)
    
        def __getitem__(self, index):
            A_path = self.A_paths[index % self.A_size]
            B_path = self.B_paths[index % self.B_size]
            
            A_img = Image.open(A_path).convert('RGB')
            B_img = Image.open(B_path).convert('RGB')
            
            A = self.transform_A(A_img)
            B = self.transform_B(B_img)
            
            return {'A': A, 'B': B, 'A_paths': A_path, 'B_paths': B_path}
    
        def __len__(self):
            return max(self.A_size, self.B_size)
  2. Register the Dataset:

    Modify data/__init__.py to include your custom dataset:

    from data.custom_dataset import CustomDataset
    
    def create_dataset(opt):
        dataset = None
        if opt.dataset_mode == 'aligned':
            from data.aligned_dataset import AlignedDataset
            dataset = AlignedDataset()
        elif opt.dataset_mode == 'unaligned':
            from data.unaligned_dataset import UnalignedDataset
            dataset = UnalignedDataset()
        elif opt.dataset_mode == 'custom':  # add this line
            dataset = CustomDataset()      # add this line
        # ... other dataset modes
        else:
            raise ValueError("Dataset [%s] not recognized." % opt.dataset_mode)
    
        dataset.initialize(opt)
        print("dataset [%s] was created" % dataset.name())
        return dataset
  3. Use Your Custom Dataset:

    python train.py --dataroot ./datasets/your_custom_dataset --name custom_experiment --model cycle_gan --dataset_mode custom

Tips for Training

  1. Appropriate Batch Size: Start with batch size 1 and increase if your GPU has enough memory.
  2. Learning Rate: The default learning rate (0.0002) works well for most cases, but you may need to adjust it for your specific dataset.
  3. Generator Architecture: resnet_9blocks works well for CycleGAN, while unet_256 is suitable for pix2pix.
  4. Image Size: The default crop size is 256x256. For larger images, you might need to adjust the network architecture.
  5. Monitoring: Use --display_freq and --print_freq to monitor training progress.
  6. Saving Frequency: Adjust --save_latest_freq and --save_epoch_freq based on your dataset size.
  7. Normalization: instance normalization works well for style transfer, while batch normalization is often better for other applications.
  8. Multi-GPU Training: Use --gpu_ids to specify multiple GPUs for faster training.

Troubleshooting

  1. Out of Memory: Reduce batch size or image size, or use a smaller network architecture.
  2. Poor Image Quality: Adjust loss weights (lambda_A, lambda_B, lambda_identity, lambda_L1).
  3. Mode Collapse: Try using a larger image buffer (pool_size) or adjust GAN mode.
  4. Unstable Training: Adjust learning rate or try a different GAN loss (gan_mode).
  5. Slow Training: Use multi-GPU training or reduce image size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment