nafeu · January 24, 2024 18:27
diff --git a/setup-ollama-minimal-elearning.memd b/setup-ollama-minimal-elearning.memd
 ---
 title: "Setting up a local LLM with Ollama"
 author: "Nafeu Nasir"
 description: "Getting up and running with Ollama and integrating it into your workflow."
 date: 2024-01-24
 ---

 # What is a local large language model? (LLM)

 You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security).

 <iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p>

 +++

 # What is Ollama?

 Ollama allows you to run open-source large language models, such as Llama 2, locally.

 Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

 It optimizes setup and configuration details, including GPU usage.

 <div align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
    <img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
  </picture>
 </div>

 +++

 # What kind of computer does it require?

 Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration.

 You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU.

 Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance.

 So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc.

 <iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p>

 +++

 # How do you set it up?


 ### macOS

 Visit Ollama's website `ollama.ai` and hit download.

 ### Windows

 Coming soon! For now, you can install Ollama on Windows via WSL2.

 ### Linux & WSL2

 ```
 curl https://ollama.ai/install.sh | sh
 ```

 <iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p>

 +++

 # Quickstart

 To run and chat with [Llama 2](https://ollama.ai/library/llama2):

 ```
 ollama run llama2
 ```

 <iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p>

 +++

 # What are some of the models available?

 Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')

 Here are some example open-source models that can be downloaded:

 | Model              | Parameters | Size  | Download                       |
 | ------------------ | ---------- | ----- | ------------------------------ |
 | Llama 2            | 7B         | 3.8GB | `ollama run llama2`            |
 | Mistral            | 7B         | 4.1GB | `ollama run mistral`           |
 | Dolphin Phi        | 2.7B       | 1.6GB | `ollama run dolphin-phi`       |
 | Phi-2              | 2.7B       | 1.7GB | `ollama run phi`               |
 | Neural Chat        | 7B         | 4.1GB | `ollama run neural-chat`       |
 | Starling           | 7B         | 4.1GB | `ollama run starling-lm`       |
 | Code Llama         | 7B         | 3.8GB | `ollama run codellama`         |
 | Llama 2 Uncensored | 7B         | 3.8GB | `ollama run llama2-uncensored` |
 | Llama 2 13B        | 13B        | 7.3GB | `ollama run llama2:13b`        |
 | Llama 2 70B        | 70B        | 39GB  | `ollama run llama2:70b`        |
 | Orca Mini          | 3B         | 1.9GB | `ollama run orca-mini`         |
 | Vicuna             | 7B         | 3.8GB | `ollama run vicuna`            |
 | LLaVA              | 7B         | 4.5GB | `ollama run llava`             |

 > Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

 +++

 # How can I customize the model with prompts?

 Create a `Modelfile`:

 ```
 FROM llama2

 # set the temperature to 1 [higher is more creative, lower is more coherent]
 PARAMETER temperature 1

 # set the system message
 SYSTEM """
 You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
 """
 ```

 Next, create and run the model:

 ```
 ollama create mario -f ./Modelfile
 ollama run mario
 >>> hi
 Hello! It's your friend Mario.
 ```

 +++

 # Some cool features:

 ### Multiline input

 For multiline input, you can wrap text with `"""`:

 ```
 >>> """Hello,
 ... world!
 ... """
 I'm a basic program that prints the famous "Hello, world!" message to the console.
 ```

 ### Multimodal models

 ```
 >>> What's in this image? /Users/jmorgan/Desktop/smile.png
 The image features a yellow smiley face, which is likely the central focus of the picture.
 ```

 ### Pass in prompt as arguments

 ```
 $ ollama run llama2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
 ```

 +++

 # How can I interact with the model outside of the terminal?

 ## REST API

 Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`):

 ### Generate a response

 ```
 curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'
 ```

 ### Chat with a model

 ```
 curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
 }'
 ```

 +++

 # Check out Fireship.io's great video on running Mistral's 8x7B Model

 <iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

 +++

 # Let's play with the model `code llama`

 <iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p>

 # What kind of editor integrations are there?

 VSCode

 [continue](https://github.com/continuedev/continue)
	---
	title: "Setting up a local LLM with Ollama"
	author: "Nafeu Nasir"
	description: "Getting up and running with Ollama and integrating it into your workflow."
	date: 2024-01-24
	---

	# What is a local large language model? (LLM)

	You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security).

	<iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p>

	+++

	# What is Ollama?

	Ollama allows you to run open-source large language models, such as Llama 2, locally.

	Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

	It optimizes setup and configuration details, including GPU usage.

	<div align="center">
	<picture>
	<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
	<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
	</picture>
	</div>

	+++

	# What kind of computer does it require?

	Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration.

	You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU.

	Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance.

	So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc.

	<iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p>

	+++

	# How do you set it up?


	### macOS

	Visit Ollama's website `ollama.ai` and hit download.

	### Windows

	Coming soon! For now, you can install Ollama on Windows via WSL2.

	### Linux & WSL2

	```
	curl https://ollama.ai/install.sh \| sh
	```

	<iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p>

	+++

	# Quickstart

	To run and chat with [Llama 2](https://ollama.ai/library/llama2):

	```
	ollama run llama2
	```

	<iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p>

	+++

	# What are some of the models available?

	Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')

	Here are some example open-source models that can be downloaded:

	\| Model \| Parameters \| Size \| Download \|
	\| ------------------ \| ---------- \| ----- \| ------------------------------ \|
	\| Llama 2 \| 7B \| 3.8GB \| `ollama run llama2` \|
	\| Mistral \| 7B \| 4.1GB \| `ollama run mistral` \|
	\| Dolphin Phi \| 2.7B \| 1.6GB \| `ollama run dolphin-phi` \|
	\| Phi-2 \| 2.7B \| 1.7GB \| `ollama run phi` \|
	\| Neural Chat \| 7B \| 4.1GB \| `ollama run neural-chat` \|
	\| Starling \| 7B \| 4.1GB \| `ollama run starling-lm` \|
	\| Code Llama \| 7B \| 3.8GB \| `ollama run codellama` \|
	\| Llama 2 Uncensored \| 7B \| 3.8GB \| `ollama run llama2-uncensored` \|
	\| Llama 2 13B \| 13B \| 7.3GB \| `ollama run llama2:13b` \|
	\| Llama 2 70B \| 70B \| 39GB \| `ollama run llama2:70b` \|
	\| Orca Mini \| 3B \| 1.9GB \| `ollama run orca-mini` \|
	\| Vicuna \| 7B \| 3.8GB \| `ollama run vicuna` \|
	\| LLaVA \| 7B \| 4.5GB \| `ollama run llava` \|

	> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

	+++

	# How can I customize the model with prompts?

	Create a `Modelfile`:

	```
	FROM llama2

	# set the temperature to 1 [higher is more creative, lower is more coherent]
	PARAMETER temperature 1

	# set the system message
	SYSTEM """
	You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
	"""
	```

	Next, create and run the model:

	```
	ollama create mario -f ./Modelfile
	ollama run mario
	>>> hi
	Hello! It's your friend Mario.
	```

	+++

	# Some cool features:

	### Multiline input

	For multiline input, you can wrap text with `"""`:

	```
	>>> """Hello,
	... world!
	... """
	I'm a basic program that prints the famous "Hello, world!" message to the console.
	```

	### Multimodal models

	```
	>>> What's in this image? /Users/jmorgan/Desktop/smile.png
	The image features a yellow smiley face, which is likely the central focus of the picture.
	```

	### Pass in prompt as arguments

	```
	$ ollama run llama2 "Summarize this file: $(cat README.md)"
	Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
	```

	+++

	# How can I interact with the model outside of the terminal?

	## REST API

	Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`):

	### Generate a response

	```
	curl http://localhost:11434/api/generate -d '{
	"model": "llama2",
	"prompt":"Why is the sky blue?"
	}'
	```

	### Chat with a model

	```
	curl http://localhost:11434/api/chat -d '{
	"model": "mistral",
	"messages": [
	{ "role": "user", "content": "why is the sky blue?" }
	]
	}'
	```

	+++

	# Check out Fireship.io's great video on running Mistral's 8x7B Model

	<iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

	+++

	# Let's play with the model `code llama`

	<iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p>

	# What kind of editor integrations are there?

	VSCode

	[continue](https://github.com/continuedev/continue)