Prepare your environment for model optimization and inference:
sudo apt update
sudo apt install git-lfs -y
Setup OpenVINO virtual env and install the nightly packages and latest Optimum-intel
python3 -m venv ov-nightly-env
source ov-env-test/bin/activate
python -m pip install --upgrade pip
# Install Nightly OpenVINO GENAI package
pip install --pre -U openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
# Install latest optimum-intel and NNCF
python -m pip install "optimum-intel[nncf]"@git+https://github.com/huggingface/optimum-intel.git
Utilize the Optimum Intel CLI to export models from HuggingFace to OpenVINO IR with various levels of weight compression:
optimum-cli export openvino \
--model "meta-llama/Meta-Llama-3-8B-Instruct" \
--weight-format int4 \
--trust-remote-code \
"meta-llama3-8B-Instruct"
Perform generation using the new GenAI API:
import openvino_genai as ov_genai
model_path = "meta-llama3-8B-Instruct"
pipe = ov_genai.LLMPipeline(model_path, "CPU")
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))