Get examples of Intel Neural Compressor (INC) up and running, with existing trained model. We will use HuggingFace's Optimum as frontend and INC is chosen as its backend. We aim to reproduce static quantization example provided by Optimum out-of-the-box
- Create a conda environment
conda create -n optimum-inc python=3.8
- Setup Intel Neural Compressor per landing page. But we do it slightly different for dev.
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git checkout tags/v1.9 -b v1.9
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py develop
- Setup HuggingFace Optimum
git clone https://github.com/huggingface/optimum
cd optimum
pip install -e .
- Quantize distillbert/sst2 (per documentation)
cd optimum/examples/inc/pytorch/text-classification
pip install -r requirements.txt
pip install torch==1.9.1 #!!! Latest 1.10 is not working
python run_glue.py \
--model_name_or_path distilbert-base-uncased-finetuned-sst-2-english \
--task_name sst2 \
--quantize \
--quantization_approach static \
--do_train \
--do_eval \
--dataloader_drop_last \
--verify_loading \
--output_dir /tmp/sst2_output
- Quantize bert-base/MRPC
cd optimum/examples/inc/pytorch/text-classification
pip install -r requirements.txt
pip install torch==1.9.1 #!!! Latest 1.10 is not working
python run_glue.py \
--model_name_or_path bert-base-cased-finetuned-mrpc \
--task_name mrpc \
--quantize \
--quantization_approach static \
--do_train \
--do_eval \
--dataloader_drop_last \
--verify_loading \
--output_dir /tmp/mrpc_output
vscode launch.json