cnndabbler/vignette_06_JUN_2023.md

Last active June 7, 2023 01:55

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/cnndabbler/e01bb628aedcfe1c110d50f7b15bf7cf.js"></script>
Save cnndabbler/e01bb628aedcfe1c110d50f7b15bf7cf to your computer and use it in GitHub Desktop.

Download ZIP

llama-cpp-python to host local LLM running inside no-code visual programming chat flow

Raw

vignette_06_JUN_2023.md

Part 1/2 : Set up an OpenAI compatible CPU-based local LLM model

What is llama-cpp-python
- llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
- https://github.com/abetlen/llama-cpp-python
Installation

pip install 'llama-cpp-python[server]'

Download the model from Huggingface: https://huggingface.co/TheBloke/LLaMa-13B-GGML/blob/main/llama-13b.ggmlv3.q4_0.bin and save it in ./models
Start the server:

python3 -m llama_cpp.server --model models/llama-13b.ggmlv3.q4_0.bin

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
Try it !
- Use the swagger interface at http://localhost:8000/docs
Observation:
- See 8000/docs showing functional emulation of an OpenAI model (input and output)
- Measure #tokens / sec on your machine
Next:
- Try with other models
- Try with GPU support
- Dockerize

Part 2/2 : Use it in a Langchain pipeline with Flowise

What is Flowise
- Open source No-Code visual programming tool to build LLM apps
Setting up Flowise
- Follow the instructions at: https://github.com/FlowiseAI/Flowise#-developers
Start the app:

yarn start

Access the app at http://localhost:3000
Design a simple conversational application

Python

Curl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment