Last active
January 24, 2024 18:27
-
-
Save nafeu/a19137e91f413d7ffeb43481d913999b to your computer and use it in GitHub Desktop.
Presentation on running setting up Ollama - nafeu.com/minimal-elearning
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Setting up a local LLM with Ollama" | |
author: "Nafeu Nasir" | |
description: "Getting up and running with Ollama and integrating it into your workflow." | |
date: 2024-01-24 | |
--- | |
# What is a local large language model? (LLM) | |
You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security). | |
<iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p> | |
+++ | |
# What is Ollama? | |
Ollama allows you to run open-source large language models, such as Llama 2, locally. | |
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. | |
It optimizes setup and configuration details, including GPU usage. | |
<div align="center"> | |
<picture> | |
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c"> | |
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7"> | |
</picture> | |
</div> | |
+++ | |
# What kind of computer does it require? | |
Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration. | |
You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU. | |
Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance. | |
So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc. | |
<iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p> | |
+++ | |
# How do you set it up? | |
### macOS | |
Visit Ollama's website `ollama.ai` and hit download. | |
### Windows | |
Coming soon! For now, you can install Ollama on Windows via WSL2. | |
### Linux & WSL2 | |
``` | |
curl https://ollama.ai/install.sh | sh | |
``` | |
<iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p> | |
+++ | |
# Quickstart | |
To run and chat with [Llama 2](https://ollama.ai/library/llama2): | |
``` | |
ollama run llama2 | |
``` | |
<iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p> | |
+++ | |
# What are some of the models available? | |
Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library') | |
Here are some example open-source models that can be downloaded: | |
| Model | Parameters | Size | Download | | |
| ------------------ | ---------- | ----- | ------------------------------ | | |
| Llama 2 | 7B | 3.8GB | `ollama run llama2` | | |
| Mistral | 7B | 4.1GB | `ollama run mistral` | | |
| Dolphin Phi | 2.7B | 1.6GB | `ollama run dolphin-phi` | | |
| Phi-2 | 2.7B | 1.7GB | `ollama run phi` | | |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` | | |
| Starling | 7B | 4.1GB | `ollama run starling-lm` | | |
| Code Llama | 7B | 3.8GB | `ollama run codellama` | | |
| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` | | |
| Llama 2 13B | 13B | 7.3GB | `ollama run llama2:13b` | | |
| Llama 2 70B | 70B | 39GB | `ollama run llama2:70b` | | |
| Orca Mini | 3B | 1.9GB | `ollama run orca-mini` | | |
| Vicuna | 7B | 3.8GB | `ollama run vicuna` | | |
| LLaVA | 7B | 4.5GB | `ollama run llava` | | |
> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. | |
+++ | |
# How can I customize the model with prompts? | |
Create a `Modelfile`: | |
``` | |
FROM llama2 | |
# set the temperature to 1 [higher is more creative, lower is more coherent] | |
PARAMETER temperature 1 | |
# set the system message | |
SYSTEM """ | |
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only. | |
""" | |
``` | |
Next, create and run the model: | |
``` | |
ollama create mario -f ./Modelfile | |
ollama run mario | |
>>> hi | |
Hello! It's your friend Mario. | |
``` | |
+++ | |
# Some cool features: | |
### Multiline input | |
For multiline input, you can wrap text with `"""`: | |
``` | |
>>> """Hello, | |
... world! | |
... """ | |
I'm a basic program that prints the famous "Hello, world!" message to the console. | |
``` | |
### Multimodal models | |
``` | |
>>> What's in this image? /Users/jmorgan/Desktop/smile.png | |
The image features a yellow smiley face, which is likely the central focus of the picture. | |
``` | |
### Pass in prompt as arguments | |
``` | |
$ ollama run llama2 "Summarize this file: $(cat README.md)" | |
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. | |
``` | |
+++ | |
# How can I interact with the model outside of the terminal? | |
## REST API | |
Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`): | |
### Generate a response | |
``` | |
curl http://localhost:11434/api/generate -d '{ | |
"model": "llama2", | |
"prompt":"Why is the sky blue?" | |
}' | |
``` | |
### Chat with a model | |
``` | |
curl http://localhost:11434/api/chat -d '{ | |
"model": "mistral", | |
"messages": [ | |
{ "role": "user", "content": "why is the sky blue?" } | |
] | |
}' | |
``` | |
+++ | |
# Check out Fireship.io's great video on running Mistral's 8x7B Model | |
<iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> | |
+++ | |
# Let's play with the model `code llama` | |
<iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p> | |
# What kind of editor integrations are there? | |
VSCode | |
[continue](https://github.com/continuedev/continue) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment