train-cifar10.md

Objective: train a resnet18 for CIFAR10 dataset

Top 1 accuracy of resnet18/CIFAR10 in this repo achieves 93%. We are not using this because it defines/implements its own Resnet. We would like to use the out-of-the-box torchvision resnet18 definition. NNCF provides an image classification example which utilizes torchvision resnet definition.

# Step 1: Create a new virtualenv or conda environment, make sure the env is activated

# Step 2: Install VS's fork of NNCF
git clone https://github.com/vuiseng9/nncf
cd nncf
git checkout train-cifar10
python setup.py develop
pip install -r examples/torch/requirements.txt
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html

# Step 3: Run basic training
cd nncf/examples/torch/classification
./train-resnet18-cifar10.sh
# This basic training should get to about 85% top1 accuracy. Do revise the bash script to point to your path of choice. Tensorboard is integrated in the sample.

# Step 4: Modify hyperparameters to improve training, targeting 93% top 1 accuracy. Few ideas:
1. Check the scheduler of reference repo above, we can follow the same hyperparameter settings
2. NNCF classification example wraps torch optimizer/scheduler and depends on nncf config. We can either configure nncf json to use other optimizer/scheduler or override them in the code itself.
3. If reproducing reference repo is helpful, pls do so.

vuiseng9/train-cifar10.md

Objective: train a resnet18 for CIFAR10 dataset