Skip to content

Instantly share code, notes, and snippets.

@ravi9
Created August 28, 2024 20:15
Show Gist options
  • Save ravi9/cf114805151302c5ab56c325f6165cf3 to your computer and use it in GitHub Desktop.
Save ravi9/cf114805151302c5ab56c325f6165cf3 to your computer and use it in GitHub Desktop.
Setup OpenVINO with Nightly builds and Latest Optimum.

Prepare your environment for model optimization and inference:

sudo apt update
sudo apt install git-lfs -y

Setup OpenVINO virtual env and install the nightly packages and latest Optimum-intel

python3 -m venv ov-nightly-env
source ov-env-test/bin/activate

python -m pip install --upgrade pip
# Install Nightly OpenVINO GENAI package
pip install --pre -U openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
# Install latest optimum-intel and NNCF
python -m pip install "optimum-intel[nncf]"@git+https://github.com/huggingface/optimum-intel.git

Utilize the Optimum Intel CLI to export models from HuggingFace to OpenVINO IR with various levels of weight compression:

optimum-cli export openvino \
--model "meta-llama/Meta-Llama-3-8B-Instruct" \
--weight-format int4 \
--trust-remote-code \
"meta-llama3-8B-Instruct"

Perform generation using the new GenAI API:

import openvino_genai as ov_genai
model_path = "meta-llama3-8B-Instruct"
pipe = ov_genai.LLMPipeline(model_path, "CPU")
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment