PyTorch-CycleGAN-and-pix2pix: Training Guide

This guide covers how to use the PyTorch-CycleGAN-and-pix2pix repository for image-to-image translation tasks, including:

Command line options for training and testing
Dataset preparation
Custom dataset implementation

Command Line Options

The repository uses a nested structure of options divided between base options (shared by both training and testing) and specific options for each mode. Let's explore these options in detail.

Base Options

These options are shared between training and testing:

Option	Type	Default	Description
`--dataroot`	string	required	Path to the dataset
`--name`	string	experiment_name	Name of the experiment (used for saving results and models)
`--gpu_ids`	list	0	IDs of GPUs to use (e.g., `0,1,2` for multi-GPU)
`--checkpoints_dir`	string	./checkpoints	Directory to save model checkpoints
`--model`	string	cycle_gan	Model type: `cycle_gan`, `pix2pix`, `test`, `colorization`, etc.
`--input_nc`	int	3	Number of input image channels
`--output_nc`	int	3	Number of output image channels
`--ngf`	int	64	Number of generator filters in the last conv layer
`--ndf`	int	64	Number of discriminator filters in the first conv layer
`--netD`	string	basic	Discriminator architecture: `basic`, `n_layers`, `pixel`
`--netG`	string	resnet_9blocks	Generator architecture: `resnet_9blocks`, `resnet_6blocks`, `unet_256`, `unet_128`
`--n_layers_D`	int	3	Number of layers in discriminator for `n_layers` architecture
`--norm`	string	instance	Normalization type: `instance`, `batch`, `none`
`--init_type`	string	normal	Initialization type: `normal`, `xavier`, `kaiming`, `orthogonal`
`--init_gain`	float	0.02	Scaling factor for initialization
`--no_dropout`	bool	False	No dropout for the generator
`--dataset_mode`	string	unaligned	Dataset mode: `unaligned` (CycleGAN), `aligned` (pix2pix), `single`, `colorization`
`--direction`	string	AtoB	Data direction for translation: `AtoB` or `BtoA`
`--serial_batches`	bool	False	If true, data is loaded in the same order each time (useful for debugging)
`--num_threads`	int	4	Number of threads for data loading
`--batch_size`	int	1	Input batch size
`--load_size`	int	286	Scale images to this size before cropping to `crop_size`
`--crop_size`	int	256	Crop images to this size
`--max_dataset_size`	int	inf	Maximum number of samples to load from the dataset
`--preprocess`	string	resize_and_crop	Preprocessing type: `resize_and_crop`, `crop`, `scale_width`, `scale_width_and_crop`, `none`
`--no_flip`	bool	False	If specified, do not flip the images
`--display_winsize`	int	256	Display window size for visdom
`--epoch`	string	latest	The epoch to load for testing: `latest`, `best`, or a specific number
`--load_iter`	int	0	Iteration to load from when using `--continue_train`
`--verbose`	bool	False	If specified, print more debugging information
`--suffix`	string	none	Customized suffix for model names
`--use_wandb`	bool	False	Use Weights & Biases for logging

Training Options

Additional options available during training:

Option	Type	Default	Description
`--display_freq`	int	400	Frequency to show training results on screen
`--display_ncols`	int	4	Number of images per row in the visualizer
`--display_id`	int	1	Window ID for the visdom display, set to 0 to disable visdom
`--display_server`	string	http://localhost	Visdom server address
`--display_env`	string	main	Visdom environment name
`--display_port`	int	8097	Visdom port
`--update_html_freq`	int	1000	Frequency to save training results to HTML
`--print_freq`	int	100	Frequency to print training losses
`--no_html`	bool	False	Do not save intermediate training results to web
`--save_latest_freq`	int	5000	Frequency to save the latest model
`--save_epoch_freq`	int	5	Frequency to save checkpoints at the end of epochs
`--save_by_iter`	bool	False	Save models by iteration instead of by epoch
`--continue_train`	bool	False	Continue training from the last checkpoint
`--epoch_count`	int	1	Starting epoch count
`--n_epochs`	int	100	Number of epochs with initial learning rate
`--n_epochs_decay`	int	100	Number of epochs with linearly decaying learning rate
`--phase`	string	train	Training phase: `train` or `test`
`--beta1`	float	0.5	Momentum term for adam optimizer
`--lr`	float	0.0002	Initial learning rate
`--gan_mode`	string	lsgan	GAN loss mode: `vanilla`, `lsgan`, or `wgangp`
`--pool_size`	int	50	Size of image buffer holding generated images (prevents oscillation)
`--lr_policy`	string	linear	Learning rate policy: `linear`, `step`, `plateau`, `cosine`
`--lr_decay_iters`	int	50	Iterations for step learning rate decay

Testing Options

Additional options available during testing:

Option	Type	Default	Description
`--phase`	string	test	Testing phase
`--eval`	bool	False	Use evaluation mode during test time
`--num_test`	int	50	Number of test images
`--aspect_ratio`	float	1.0	Aspect ratio of result images
`--results_dir`	string	./results/	Directory to save test results

Model-Specific Options

CycleGAN Options

Option	Type	Default	Description
`--lambda_A`	float	10.0	Weight for cycle loss (A -> B -> A)
`--lambda_B`	float	10.0	Weight for cycle loss (B -> A -> B)
`--lambda_identity`	float	0.5	Weight for identity mapping loss

pix2pix Options

Option	Type	Default	Description
`--lambda_L1`	float	100.0	Weight for L1 loss

Training Commands

Training CycleGAN

python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

Training pix2pix

python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA

Testing Commands

Testing CycleGAN

python test.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

Testing pix2pix

python test.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA

Apply a Pre-trained Model

# Download a pre-trained model
bash ./scripts/download_cyclegan_model.sh horse2zebra

# Test the model
python test.py --dataroot datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout

Preparing Custom Datasets

The repository supports various dataset formats. Here's how to prepare them:

Dataset Structure

Aligned Datasets (for pix2pix)

For paired image-to-image translation (pix2pix), the dataset should be structured as follows:

custom_dataset/
├── train/
│   └── [paired images: input_output.jpg]
├── val/
│   └── [paired images: input_output.jpg]
└── test/
    └── [paired images: input_output.jpg]

Each image should contain both the input and output side-by-side. You can use the script scripts/combine_A_and_B.py to combine images from two folders into paired images.

python scripts/combine_A_and_B.py --fold_A /path/to/input_images --fold_B /path/to/output_images --fold_AB /path/to/output_directory

Unaligned Datasets (for CycleGAN)

For unpaired image-to-image translation (CycleGAN), organize your dataset as follows:

custom_dataset/
├── trainA/
│   └── [images from domain A]
├── trainB/
│   └── [images from domain B]
├── testA/
│   └── [images from domain A for testing]
└── testB/
    └── [images from domain B for testing]

Dataset Downloading

The repository provides scripts to download various datasets:

# Download CycleGAN datasets
bash ./datasets/download_cyclegan_dataset.sh [dataset_name]

# Download pix2pix datasets
bash ./datasets/download_pix2pix_dataset.sh [dataset_name]

Available dataset names for CycleGAN: apple2orange, summer2winter_yosemite, horse2zebra, monet2photo, cezanne2photo, ukiyoe2photo, vangogh2photo, maps, cityscapes, facades, iphone2dslr_flower.

Available dataset names for pix2pix: facades, cityscapes, maps, edges2shoes, edges2handbags.

Creating a Custom Dataset

To implement a custom dataset:

Create a Dataset Class:

Create a new file in the data directory, for example, custom_dataset.py:

import os
from data.base_dataset import BaseDataset, get_transform
from data.image_folder import make_dataset
from PIL import Image

class CustomDataset(BaseDataset):
    def __init__(self, opt):
        BaseDataset.__init__(self, opt)
        self.dir_A = os.path.join(opt.dataroot, 'trainA')  # create path for domain A
        self.dir_B = os.path.join(opt.dataroot, 'trainB')  # create path for domain B
        
        self.A_paths = sorted(make_dataset(self.dir_A, opt.max_dataset_size))
        self.B_paths = sorted(make_dataset(self.dir_B, opt.max_dataset_size))
        self.A_size = len(self.A_paths)
        self.B_size = len(self.B_paths)
        
        self.transform_A = get_transform(opt)
        self.transform_B = get_transform(opt)

    def __getitem__(self, index):
        A_path = self.A_paths[index % self.A_size]
        B_path = self.B_paths[index % self.B_size]
        
        A_img = Image.open(A_path).convert('RGB')
        B_img = Image.open(B_path).convert('RGB')
        
        A = self.transform_A(A_img)
        B = self.transform_B(B_img)
        
        return {'A': A, 'B': B, 'A_paths': A_path, 'B_paths': B_path}

    def __len__(self):
        return max(self.A_size, self.B_size)

Register the Dataset:

Modify data/__init__.py to include your custom dataset:

from data.custom_dataset import CustomDataset

def create_dataset(opt):
    dataset = None
    if opt.dataset_mode == 'aligned':
        from data.aligned_dataset import AlignedDataset
        dataset = AlignedDataset()
    elif opt.dataset_mode == 'unaligned':
        from data.unaligned_dataset import UnalignedDataset
        dataset = UnalignedDataset()
    elif opt.dataset_mode == 'custom':  # add this line
        dataset = CustomDataset()      # add this line
    # ... other dataset modes
    else:
        raise ValueError("Dataset [%s] not recognized." % opt.dataset_mode)

    dataset.initialize(opt)
    print("dataset [%s] was created" % dataset.name())
    return dataset

Use Your Custom Dataset:

python train.py --dataroot ./datasets/your_custom_dataset --name custom_experiment --model cycle_gan --dataset_mode custom

Tips for Training

Appropriate Batch Size: Start with batch size 1 and increase if your GPU has enough memory.
Learning Rate: The default learning rate (0.0002) works well for most cases, but you may need to adjust it for your specific dataset.
Generator Architecture: resnet_9blocks works well for CycleGAN, while unet_256 is suitable for pix2pix.
Image Size: The default crop size is 256x256. For larger images, you might need to adjust the network architecture.
Monitoring: Use --display_freq and --print_freq to monitor training progress.
Saving Frequency: Adjust --save_latest_freq and --save_epoch_freq based on your dataset size.
Normalization: instance normalization works well for style transfer, while batch normalization is often better for other applications.
Multi-GPU Training: Use --gpu_ids to specify multiple GPUs for faster training.

Troubleshooting

Out of Memory: Reduce batch size or image size, or use a smaller network architecture.
Poor Image Quality: Adjust loss weights (lambda_A, lambda_B, lambda_identity, lambda_L1).
Mode Collapse: Try using a larger image buffer (pool_size) or adjust GAN mode.
Unstable Training: Adjust learning rate or try a different GAN loss (gan_mode).
Slow Training: Use multi-GPU training or reduce image size.

richbai90/pix2pix_guide.md