This Python client allows you to interact with a locally deployed llama.cpp server using an OpenAI-compatible API. It provides a simple interface to send prompts and receive responses from your local language model.
-
Start llama.cpp server:
Navigate to your llama.cpp directory and run:
./llama-server -m /path/to/your/model -c 8000 --gpu-layers 100 -np 5 # -m=model_path, -c=context_window, --gpu-layers=GPU_layers, -np=parallel_sequences
The API server will be accessible at
http://localhost:8080/v1
. -
Install required Python package:
pip install openai
-
Download the client:
Save
llama_cpp_client.py
in your project directory.
-
Import the client:
from llama_cpp_client import LlamaCppClient
-
Create a client instance:
llama_client = LlamaCppClient()
-
Send a prompt:
response = llama_client.call_model("Your prompt here") print(response)
- This client is designed to work with a llama.cpp server that implements an OpenAI-compatible API. Make sure your llama.cpp server is configured correctly.
- The API key is set to "sk-no-key-required" by default, as local deployments typically don't require authentication.
By using this client, you can interact with your locally deployed llama.cpp model in the same way you would with the OpenAI API, making it easy to switch between different backends or compare results.