-
What is llama-cpp-python
- llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
- https://github.com/abetlen/llama-cpp-python
-
Installation
pip install 'llama-cpp-python[server]'
-
Download the model from Huggingface: https://huggingface.co/TheBloke/LLaMa-13B-GGML/blob/main/llama-13b.ggmlv3.q4_0.bin and save it in ./models
-
Start the server:
python3 -m llama_cpp.server --model models/llama-13b.ggmlv3.q4_0.bin
-
Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
-
Try it !
- Use the swagger interface at http://localhost:8000/docs
-
Observation:
- See 8000/docs showing functional emulation of an OpenAI model (input and output)
- Measure #tokens / sec on your machine
-
Next:
- Try with other models
- Try with GPU support
- Dockerize
-
What is Flowise
- Open source No-Code visual programming tool to build LLM apps
-
Setting up Flowise
- Follow the instructions at: https://github.com/FlowiseAI/Flowise#-developers
-
Start the app:
yarn start
-
Access the app at http://localhost:3000
-
Design a simple conversational application
- Python
- Curl