Skip to content

Instantly share code, notes, and snippets.

@mertbozkir
Last active August 23, 2024 08:31
Show Gist options
  • Save mertbozkir/5618abb0c81f6134f24f976732ac9c33 to your computer and use it in GitHub Desktop.
Save mertbozkir/5618abb0c81f6134f24f976732ac9c33 to your computer and use it in GitHub Desktop.
Llama.cpp client for python (fastest way!)

Llama.cpp Client for Python

This Python client allows you to interact with a locally deployed llama.cpp server using an OpenAI-compatible API. It provides a simple interface to send prompts and receive responses from your local language model.

Setup

  1. Start llama.cpp server:

    Navigate to your llama.cpp directory and run:

    ./llama-server -m /path/to/your/model -c 8000 --gpu-layers 100 -np 5
    # -m=model_path, -c=context_window, --gpu-layers=GPU_layers, -np=parallel_sequences
    

    The API server will be accessible at http://localhost:8080/v1.

  2. Install required Python package:

    pip install openai
    
  3. Download the client:

    Save llama_cpp_client.py in your project directory.

Usage

  1. Import the client:

    from llama_cpp_client import LlamaCppClient
  2. Create a client instance:

    llama_client = LlamaCppClient()
  3. Send a prompt:

    response = llama_client.call_model("Your prompt here")
    print(response)

Notes

  • This client is designed to work with a llama.cpp server that implements an OpenAI-compatible API. Make sure your llama.cpp server is configured correctly.
  • The API key is set to "sk-no-key-required" by default, as local deployments typically don't require authentication.

By using this client, you can interact with your locally deployed llama.cpp model in the same way you would with the OpenAI API, making it easy to switch between different backends or compare results.

import openai
import os
class Client:
def __init__(self, api_key=None):
self.api_key = api_key
def call_model(self, body, model_id):
# Add your code here
pass
class LlamaCpp(Client):
def __init__(self, base_url="http://localhost:8080/v1", api_key="sk-no-key-required"):
super().__init__(api_key)
self.base_url = base_url
self.client = openai.OpenAI(base_url=base_url, api_key=self.api_key)
def call_model(self, prompt, model_id="gpt-3.5-turbo"):
completion = self.client.chat.completions.create(
model=model_id,
messages=[
{"role": "system", "content": "You are an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
{"role": "user", "content": prompt}
]
)
response_body = completion.choices[0].message.content
return response_body
# Usage example
if __name__ == "__main__":
llama_client = LlamaCpp()
response = llama_client.call_model("Write a limerick about python exceptions")
print(response)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment