This guide covers how to use the PyTorch-CycleGAN-and-pix2pix repository for image-to-image translation tasks, including:
- Command line options for training and testing
- Dataset preparation
- Custom dataset implementation
The repository uses a nested structure of options divided between base options (shared by both training and testing) and specific options for each mode. Let's explore these options in detail.
These options are shared between training and testing:
Option | Type | Default | Description |
---|---|---|---|
--dataroot |
string | required | Path to the dataset |
--name |
string | experiment_name | Name of the experiment (used for saving results and models) |
--gpu_ids |
list | 0 | IDs of GPUs to use (e.g., 0,1,2 for multi-GPU) |
--checkpoints_dir |
string | ./checkpoints | Directory to save model checkpoints |
--model |
string | cycle_gan | Model type: cycle_gan , pix2pix , test , colorization , etc. |
--input_nc |
int | 3 | Number of input image channels |
--output_nc |
int | 3 | Number of output image channels |
--ngf |
int | 64 | Number of generator filters in the last conv layer |
--ndf |
int | 64 | Number of discriminator filters in the first conv layer |
--netD |
string | basic | Discriminator architecture: basic , n_layers , pixel |
--netG |
string | resnet_9blocks | Generator architecture: resnet_9blocks , resnet_6blocks , unet_256 , unet_128 |
--n_layers_D |
int | 3 | Number of layers in discriminator for n_layers architecture |
--norm |
string | instance | Normalization type: instance , batch , none |
--init_type |
string | normal | Initialization type: normal , xavier , kaiming , orthogonal |
--init_gain |
float | 0.02 | Scaling factor for initialization |
--no_dropout |
bool | False | No dropout for the generator |
--dataset_mode |
string | unaligned | Dataset mode: unaligned (CycleGAN), aligned (pix2pix), single , colorization |
--direction |
string | AtoB | Data direction for translation: AtoB or BtoA |
--serial_batches |
bool | False | If true, data is loaded in the same order each time (useful for debugging) |
--num_threads |
int | 4 | Number of threads for data loading |
--batch_size |
int | 1 | Input batch size |
--load_size |
int | 286 | Scale images to this size before cropping to crop_size |
--crop_size |
int | 256 | Crop images to this size |
--max_dataset_size |
int | inf | Maximum number of samples to load from the dataset |
--preprocess |
string | resize_and_crop | Preprocessing type: resize_and_crop , crop , scale_width , scale_width_and_crop , none |
--no_flip |
bool | False | If specified, do not flip the images |
--display_winsize |
int | 256 | Display window size for visdom |
--epoch |
string | latest | The epoch to load for testing: latest , best , or a specific number |
--load_iter |
int | 0 | Iteration to load from when using --continue_train |
--verbose |
bool | False | If specified, print more debugging information |
--suffix |
string | none | Customized suffix for model names |
--use_wandb |
bool | False | Use Weights & Biases for logging |
Additional options available during training:
Option | Type | Default | Description |
---|---|---|---|
--display_freq |
int | 400 | Frequency to show training results on screen |
--display_ncols |
int | 4 | Number of images per row in the visualizer |
--display_id |
int | 1 | Window ID for the visdom display, set to 0 to disable visdom |
--display_server |
string | http://localhost | Visdom server address |
--display_env |
string | main | Visdom environment name |
--display_port |
int | 8097 | Visdom port |
--update_html_freq |
int | 1000 | Frequency to save training results to HTML |
--print_freq |
int | 100 | Frequency to print training losses |
--no_html |
bool | False | Do not save intermediate training results to web |
--save_latest_freq |
int | 5000 | Frequency to save the latest model |
--save_epoch_freq |
int | 5 | Frequency to save checkpoints at the end of epochs |
--save_by_iter |
bool | False | Save models by iteration instead of by epoch |
--continue_train |
bool | False | Continue training from the last checkpoint |
--epoch_count |
int | 1 | Starting epoch count |
--n_epochs |
int | 100 | Number of epochs with initial learning rate |
--n_epochs_decay |
int | 100 | Number of epochs with linearly decaying learning rate |
--phase |
string | train | Training phase: train or test |
--beta1 |
float | 0.5 | Momentum term for adam optimizer |
--lr |
float | 0.0002 | Initial learning rate |
--gan_mode |
string | lsgan | GAN loss mode: vanilla , lsgan , or wgangp |
--pool_size |
int | 50 | Size of image buffer holding generated images (prevents oscillation) |
--lr_policy |
string | linear | Learning rate policy: linear , step , plateau , cosine |
--lr_decay_iters |
int | 50 | Iterations for step learning rate decay |
Additional options available during testing:
Option | Type | Default | Description |
---|---|---|---|
--phase |
string | test | Testing phase |
--eval |
bool | False | Use evaluation mode during test time |
--num_test |
int | 50 | Number of test images |
--aspect_ratio |
float | 1.0 | Aspect ratio of result images |
--results_dir |
string | ./results/ | Directory to save test results |
Option | Type | Default | Description |
---|---|---|---|
--lambda_A |
float | 10.0 | Weight for cycle loss (A -> B -> A) |
--lambda_B |
float | 10.0 | Weight for cycle loss (B -> A -> B) |
--lambda_identity |
float | 0.5 | Weight for identity mapping loss |
Option | Type | Default | Description |
---|---|---|---|
--lambda_L1 |
float | 100.0 | Weight for L1 loss |
python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan
python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA
python test.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan
python test.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA
# Download a pre-trained model
bash ./scripts/download_cyclegan_model.sh horse2zebra
# Test the model
python test.py --dataroot datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout
The repository supports various dataset formats. Here's how to prepare them:
For paired image-to-image translation (pix2pix), the dataset should be structured as follows:
custom_dataset/
├── train/
│ └── [paired images: input_output.jpg]
├── val/
│ └── [paired images: input_output.jpg]
└── test/
└── [paired images: input_output.jpg]
Each image should contain both the input and output side-by-side. You can use the script scripts/combine_A_and_B.py
to combine images from two folders into paired images.
python scripts/combine_A_and_B.py --fold_A /path/to/input_images --fold_B /path/to/output_images --fold_AB /path/to/output_directory
For unpaired image-to-image translation (CycleGAN), organize your dataset as follows:
custom_dataset/
├── trainA/
│ └── [images from domain A]
├── trainB/
│ └── [images from domain B]
├── testA/
│ └── [images from domain A for testing]
└── testB/
└── [images from domain B for testing]
The repository provides scripts to download various datasets:
# Download CycleGAN datasets
bash ./datasets/download_cyclegan_dataset.sh [dataset_name]
# Download pix2pix datasets
bash ./datasets/download_pix2pix_dataset.sh [dataset_name]
Available dataset names for CycleGAN: apple2orange
, summer2winter_yosemite
, horse2zebra
, monet2photo
, cezanne2photo
, ukiyoe2photo
, vangogh2photo
, maps
, cityscapes
, facades
, iphone2dslr_flower
.
Available dataset names for pix2pix: facades
, cityscapes
, maps
, edges2shoes
, edges2handbags
.
To implement a custom dataset:
-
Create a Dataset Class:
Create a new file in the
data
directory, for example,custom_dataset.py
:import os from data.base_dataset import BaseDataset, get_transform from data.image_folder import make_dataset from PIL import Image class CustomDataset(BaseDataset): def __init__(self, opt): BaseDataset.__init__(self, opt) self.dir_A = os.path.join(opt.dataroot, 'trainA') # create path for domain A self.dir_B = os.path.join(opt.dataroot, 'trainB') # create path for domain B self.A_paths = sorted(make_dataset(self.dir_A, opt.max_dataset_size)) self.B_paths = sorted(make_dataset(self.dir_B, opt.max_dataset_size)) self.A_size = len(self.A_paths) self.B_size = len(self.B_paths) self.transform_A = get_transform(opt) self.transform_B = get_transform(opt) def __getitem__(self, index): A_path = self.A_paths[index % self.A_size] B_path = self.B_paths[index % self.B_size] A_img = Image.open(A_path).convert('RGB') B_img = Image.open(B_path).convert('RGB') A = self.transform_A(A_img) B = self.transform_B(B_img) return {'A': A, 'B': B, 'A_paths': A_path, 'B_paths': B_path} def __len__(self): return max(self.A_size, self.B_size)
-
Register the Dataset:
Modify
data/__init__.py
to include your custom dataset:from data.custom_dataset import CustomDataset def create_dataset(opt): dataset = None if opt.dataset_mode == 'aligned': from data.aligned_dataset import AlignedDataset dataset = AlignedDataset() elif opt.dataset_mode == 'unaligned': from data.unaligned_dataset import UnalignedDataset dataset = UnalignedDataset() elif opt.dataset_mode == 'custom': # add this line dataset = CustomDataset() # add this line # ... other dataset modes else: raise ValueError("Dataset [%s] not recognized." % opt.dataset_mode) dataset.initialize(opt) print("dataset [%s] was created" % dataset.name()) return dataset
-
Use Your Custom Dataset:
python train.py --dataroot ./datasets/your_custom_dataset --name custom_experiment --model cycle_gan --dataset_mode custom
- Appropriate Batch Size: Start with batch size 1 and increase if your GPU has enough memory.
- Learning Rate: The default learning rate (0.0002) works well for most cases, but you may need to adjust it for your specific dataset.
- Generator Architecture:
resnet_9blocks
works well for CycleGAN, whileunet_256
is suitable for pix2pix. - Image Size: The default crop size is 256x256. For larger images, you might need to adjust the network architecture.
- Monitoring: Use
--display_freq
and--print_freq
to monitor training progress. - Saving Frequency: Adjust
--save_latest_freq
and--save_epoch_freq
based on your dataset size. - Normalization:
instance
normalization works well for style transfer, whilebatch
normalization is often better for other applications. - Multi-GPU Training: Use
--gpu_ids
to specify multiple GPUs for faster training.
- Out of Memory: Reduce batch size or image size, or use a smaller network architecture.
- Poor Image Quality: Adjust loss weights (
lambda_A
,lambda_B
,lambda_identity
,lambda_L1
). - Mode Collapse: Try using a larger image buffer (
pool_size
) or adjust GAN mode. - Unstable Training: Adjust learning rate or try a different GAN loss (
gan_mode
). - Slow Training: Use multi-GPU training or reduce image size.