Optimizing TrOCR with OpenVINO

This guide provides detailed instructions for optimizing the https://huggingface.co/microsoft/trocr-base-printed model with OpenVINO.

Environment Setup

To prepare your environment for model optimization and inference:

sudo apt update
sudo apt install git-lfs -y

python3 -m venv openvino-env
source openvino-env/bin/activate
pip install --upgrade pip

python -m pip install "optimum-intel[openvino]"@git+https://github.com/huggingface/optimum-intel.git

Sample TrOCR Pipeline with OpenVINO (FP32)

Optimize your Hugging Face models for inference using the OpenVINO runtime by replacing standard transformer model classes with corresponding OpenVINO classes. For example, AutoModelForXxx becomes OVModelForXxx. For TrOCR, use OVModelForVision2Seq as shown below:

from transformers import TrOCRProcessor
from optimum.intel.openvino import OVModelForVision2Seq
from PIL import Image
import requests

# Load image from the IAM database
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

processor = TrOCRProcessor.from_pretrained('microsoft/trocr-small-handwritten')
model = OVModelForVision2Seq.from_pretrained('microsoft/trocr-small-handwritten', export=True)
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

For 8-bit quantization during model loading, set load_in_8bit=True when calling from_pretrained():

model = OVModelForVision2Seq.from_pretrained('microsoft/trocr-small-handwritten', load_in_8bit=True, export=True)

NOTE: The load_in_8bit option is enabled by default for models with more than 1 billion parameters, which can be disabled with load_in_8bit=False.

Exporting Models with FP16, INT8 and INT4 Precision Formats Using Optimum-CLI

Utilize the Optimum Intel CLI to export models from HuggingFace to OpenVINO IR with various levels of weight compression:

optimum-cli export openvino --model MODEL_ID --weight-format WEIGHT_FORMAT --output EXPORT_PATH

Replace placeholders appropriately:

MODEL_ID: ID of the HuggingFace model.
WEIGHT_FORMAT: Desired weight format, options include {fp32,fp16,int8,int4,int4_sym_g128,int4_asym_g128,int4_sym_g64,int4_asym_g64}. Refer to the Optimum Intel documentation for more details.
EXPORT_PATH: Directory path for storing the exported OpenVINO model.
--ratio RATIO: (Default: 0.8) Compression ratio between primary and backup precision. In the case of INT4, NNCF evaluates layer sensitivity and keeps the most impactful layers in INT8 precision (by default 20% in INT8). This helps to achieve better accuracy after weight compression.

To see complete usage, execute:

optimum-cli export openvino -h

Example commands to export microsoft/trocr-base-printed with different precision formats (FP16, INT8, and INT4):

optimum-cli export openvino --model microsoft/trocr-base-printed --weight-format fp16 ov_model_fp16
optimum-cli export openvino --model microsoft/trocr-base-printed --weight-format int8 ov_model_int8
optimum-cli export openvino --model microsoft/trocr-base-printed --weight-format int4 ov_model_int4

After conversion, pass the converted model path as model_id when using from_pretrained(). Also, you can determine your target device (CPU, GPU, or MULTI:CPU,GPU) as device argument in that method. See documentation for other supported device options: AUTO, HETERO, BATCH.

from transformers import TrOCRProcessor
from optimum.intel.openvino import OVModelForVision2Seq
from PIL import Image
import requests

url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

device = "CPU"
ov_config = {"PERFORMANCE_HINT": "LATENCY", "CACHE_DIR": "./ov_cache"}

processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-printed')
model = OVModelForVision2Seq.from_pretrained(model_id='./ov_model_int8', device=device, ov_config=ov_config, export=False)
pixel_values = processor(images=image, return_tensors="pt").pixel

_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

ravi9/openvino-TrOCR-readme.md

Optimizing TrOCR with OpenVINO

Environment Setup

Sample TrOCR Pipeline with OpenVINO (FP32)

Exporting Models with FP16, INT8 and INT4 Precision Formats Using Optimum-CLI