API: https://github.com/ollama/ollama/blob/main/docs/api.md
Docker: https://github.com/ollama/ollama/blob/main/docs/docker.md
Ollama docker-compose + UI: https://github.com/mythrantic/ollama-docker

Start the Ollama API Server

Warning

Running from a docker container in MacOS M3 with GPUs won't work from Docker

$ docker run -d -v $HOME/.ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
9d2823c57c7ebc57c47456afa05e74a915c64c59c2904b7b5b4cc60a238f1b02

If you look at the logs, you will notice the CPUs are used

...
...
ollama-server  | time=2025-01-11T01:17:34.798Z level=INFO source=routes.go:1310 msg="Listening on [::]:11434 (version 0.5.4-0-g2ddc32d-dirty)"
ollama-server  | time=2025-01-11T01:17:34.799Z level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cuda_jetpack5 cuda_jetpack6 cuda_v11 cuda_v12]"
ollama-server  | time=2025-01-11T01:17:34.799Z level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
ollama-server  | time=2025-01-11T01:17:34.800Z level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"
ollama-server  | time=2025-01-11T01:17:34.801Z level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant="no vector extensions" compute="" driver=0.0 name="" total="17.5 GiB" available="15.9 GiB"

Ollama on MacOS Sillicon M3 GPU Support

Running Ollama natively works on M3, but you must run it without docker

Warning

You must install ollama from their website

More info at: https://chariotsolutions.com/blog/post/apple-silicon-gpus-docker-and-ollama-pick-two/

$ OLLAMA_HOST=0.0.0.0:11234 ollama serve
...
...
time=2025-01-10T17:16:24.333-08:00 level=INFO source=routes.go:1310 msg="Listening on [::]:11234 (version 0.5.4)"
time=2025-01-10T17:16:24.333-08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners=[metal]
time=2025-01-10T17:16:24.383-08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="36.0 GiB" available="36.0 GiB"

Ollama Versions: https://hub.docker.com/r/ollama/ollama/tags

Pull Models using the `ollama pull MODEL`

This is very similar to the Docker Image format, but it uses Ollama's image structure

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull llama2
pulling manifest
pulling 8934d96d3f08... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success

List models with `ollama list`

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME         	ID          	SIZE  	MODIFIED
llama2:latest	78e26419b446	3.8 GB	40 minutes ago

List Ollama's Models from the host volume

Ollama registry dir locally, which is mapped to the container volume

$ find ~/.ollama -name llama2
/Users/marcellodesales/.ollama/models/manifests/registry.ollama.ai/library/llama2

Pull Ollama models from the API Server `/api/pull`

Verify the server's port numbers, etc

$ docker ps                                                                                  
CONTAINER ID   IMAGE                  COMMAND                  CREATED        STATUS        PORTS                       NAMES
13354dd76274   ollama/ollama          "/bin/ollama serve"      30 hours ago   Up 30 hours   0.0.0.0:11434->11434/tcp    ollama

NOTE: This should be a protected API with Oauth if you want to make this production-ready or disable it altogether.
NOTE2: this is a very slow operation due to the output of the json response... This is really if you are building a UI for this

$ curl -X POST http://localhost:11434/api/pull -d '{    
  "name": "llama2:latest"      
}'
…
…
…

{"status":"downloading sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","digest":"sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","total":105}

{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529}
{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529,"completed":529}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"removing any unused layers"}
{"status":"success"}

List Ollama models from the API Server `/api/tags`

Verify the list of models

$ curl -s http://localhost:11434/api/tags | jq
{
  "models": [
    {
      "name": "llama2:latest",
      "model": "llama2:latest",
      "modified_at": "2024-06-25T08:09:15.592678236Z",
      "size": 3826793677,
      "digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Which is compatible with the output of the CLI

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME         	ID          	SIZE  	MODIFIED
llama2:latest	78e26419b446	3.8 GB	40 minutes ago

Verify multiple models structure

Just using the CLI inside the docker image itself (entrypoint)
Docker Blobs are downloaded into the blobs dir, as well as their manifests

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull codellama:code
$ cat models/manifests/registry.ollama.ai/library/codellama/code

{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json",
"config":{"mediaType":"application/vnd.docker.container.image.v1+json",
"digest":"sha256:23fbdb4ea003a1e1c38187539cc4cc8e85c6fb80160a659e25894ca60e781a33",
"size":455},"layers":[{"mediaType":"application/vnd.ollama.image.model",
"digest":"sha256:8b2eceb7b7a11c307bc9deed38b263e05015945dc0fa2f50c0744c5d49dd293e",
"size":3825898144},{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b","size":7020},
{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:590d74a5569b8a20eb2a8b0aa869d1d1d3faf6a7fdda1955ae827073c7f502fc","size":4790},
{"mediaType":"application/vnd.ollama.image.params","digest":
"sha256:d2b44be9e12117ee2652e9a6c51df28ef408bf487e770b11ee0f7bce8790f3ca","size":31}]}

Listing the models again will show the models available

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME          	ID          	SIZE  	MODIFIED
codellama:code	fc84f39375bc	3.8 GB	3 minutes ago
llama2:latest 	78e26419b446	3.8 GB	50 minutes ago  

$ ls -la ~/.ollama/models/manifests/registry.ollama.ai/library/
total 0
drwxr-xr-x  4 marcellodesales  staff  128 Jun 24 23:10 .
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 22:23 ..
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 23:10 codellama
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 22:23 llama2

Interacting with a Model with API using Shell curl

Just seeing the same API contract for the model and prompt

$ curl -i -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Wed, 08 Nov 2023 18:59:09 GMT
Transfer-Encoding: chunked

{"model":"llama2","created_at":"2023-11-08T18:59:09.464416338Z","response":"\n","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:14.982356591Z","response":"The","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:19.764668885Z","response":" sky","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:24.59062022Z","response":" appears","done":false}
^C

Given the structure, we can collect the stream to output for a UI however we like
Say, create a file say curl-api-prompt.sh using curl

counter=0;
sentences="";
final=""
curl -s --no-buffer http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }' | \
while read line; do
  done=$(echo ${line} | jq -r '.done');
  echo "**** Current line is ${line}";
  if [ "${done}" == "true" ]; then
    break
  else
    response=$(echo ${line} | jq -r '.response');
    if [ "${response}" = "n" ]; then
      echo "" >> response.txt;
      echo "" >> response.txt;

      echo "RESPONSE SO FAR $(cat response.txt)"
      echo ""
    fi

    echo -n "${response}" >> response.txt
    counter=$((counter + ${#sentences}));
    if (( counter >= 600 )); then
      counter=0;
      sentences="";
      echo "" >> response.txt
      echo "" >> response.txt
    fi
  fi
done
echo ""
echo "!!!! Complete Message: $(cat response.txt)"

Then, ask a question to the API server

$ bash curl-api-prompt.sh
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.141035794Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.245325002Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.765266961Z","response":"m","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.935886877Z","response":" just","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.049312086Z","response":" an","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.697848878Z","response":" A","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.818389878Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.338368878Z","response":",","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.71665142Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.813270087Z","response":" don","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.934463295Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.034801587Z","response":"t","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.181949795Z","response":" have","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.49937242Z","response":" personal","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.746886295Z","response":" prefer","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.836084212Z","response":"ences","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.140950754Z","response":" or","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.578687421Z","response":" use","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.780265962Z","response":" specific","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.877094587Z","response":" models","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.025753879Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.256480546Z","response":" My","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.478015629Z","response":" responses","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.778192463Z","response":" are","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.961603338Z","response":" generated","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:52.061262421Z","response":" based","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:53.737052422Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.020325839Z","response":" patterns","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.511548506Z","response":" and","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.134965173Z","response":" relationships","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.786850465Z","response":" in","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.193616549Z","response":" language","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.746161299Z","response":" that","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:58.463891424Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:59.08879705Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.1473888Z","response":"ve","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.881787467Z","response":" been","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:01.673808843Z","response":" trained","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:02.306899885Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.007693135Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.490258844Z","response":" Is","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:04.058347719Z","response":" there","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:05.832360178Z","response":" something","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.234360971Z","response":" else","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.920473721Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.05374643Z","response":" can","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.974550722Z","response":" help","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:11.604816958Z","response":" with","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:12.53442975Z","response":"?","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:15.15323871Z","response":"","done":true,"done_reason":"stop","context":[518,25580,29962,3532,14816,29903,29958,5299,829,14816,29903,6778,13,13,5816,1904,
526,366,773,29973,518,29914,25580,29962,13,29902,29915,29885,925,385,319,29902,29892,
306,1016,29915,29873,505,7333,5821,2063,470,671,2702,4733,29889,1619,20890,526,5759,
2729,373,15038,322,21702,297,4086,393,306,29915,345,1063,16370,373,29889,1317,727,1554,
1683,306,508,1371,411,29973],"total_duration":29212813305,"load_duration":4693792,
"prompt_eval_duration":152269000,"eval_count":50,"eval_duration":29013184000}

!!!! Complete Message: I'm just an AI, I don't have personal preferences or use 
specific models. My responses are generated based on patterns and relationships 
in language that I've been trained on. Is there something else I can help with?

Interacting with a Model usnig `ollama run MODEL`

Just run the docker image with the command, pointing to the server that's running
- For testing, just map the local network. Ollama CLI will use the default port number to communicate with the server
- As we have a server running from the first commands above, the cli will submit requests to the ollama server

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run llama2
>>> can we use this model for generating pdfs?
Yes, it is possible to use the transformer model for generating PDFs. In fact, there are several papers that have proposed using transformer models for generating PDFs,
including:

1. "Transformers for Generating Probabilistic Differential Equations" by J. P. Riley and A. M. C. S. Sousa (2020)
2. "A Transformer-Based Model for Generating Differential Equations" by H. Yu, et al. (2019)
3. "Generative Models^C

>>>
Use Ctrl + d or /bye to exit.

Cache Ollama Models as Docker Data Images

NOTE: this is a hack

I've created this method to bypass the 403 problems from ollama/ollama#676 (comment)

That is, instead of using Ollama pull command, I can backup any more, pulled from the registry
Using an old method to cache data images, create a Dockerfile under the Ollama's data dir
- This is to bypass the problems about proxies, as we have everything cloud-native already.
Drop this Dockerfile under the ~/.ollama directory of your server to backup a model

ARG MODEL
ARG VERSION

FROM cfmanteiga/alpine-bash-curl-jq AS data

WORKDIR /.ollama/models

COPY models .

FROM data AS docker-blobs

ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION

WORKDIR /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.layers[].digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g |  tr '\n' '\0' | xargs -Ifile cp file /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.config.digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g | xargs -Ifile cp file /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cp /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} /.ollama/model-config

FROM busybox AS model-backup

WORKDIR /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config

ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION

WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && \
    mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/${MODEL} && \
    cp /.ollama/model-config /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION}

Build the data image with the model you need backus from
- Push it as well if needed...

$ docker buildx build --platform=linux/amd64 --tag marcellodesales/ollama-model-llama2:latest 
                      --build-arg VERSION=latest --build-arg MODEL=llama2 --target model-backup .
[+] Building 0.8s (19/19) FINISHED                                                                                                                       docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                     0.0s
 => => transferring dockerfile: 1.55kB                                                                                                                                   0.0s
 => [internal] load metadata for docker.io/cfmanteiga/alpine-bash-curl-jq:latest                                                                                         0.8s
 => [internal] load metadata for docker.io/library/busybox:latest                                                                                                        0.8s
 => [internal] load .dockerignore                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                          0.0s
 => [data 1/3] FROM docker.io/cfmanteiga/alpine-bash-curl-jq:latest@sha256:e09a3d5d52abb27830b44a2c279d09be66fad5bf476b3d02fb4a4a6125e377fc                              0.0s
 => [model-backup 1/6] FROM docker.io/library/busybox:latest@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7                                     0.0s
 => [internal] load build context                                                                                                                                        0.0s
 => => transferring context: 1.55kB                                                                                                                                      0.0s
 => CACHED [model-backup 2/6] WORKDIR /.ollama/models/blobs                                                                                                              0.0s
 => CACHED [data 2/3] WORKDIR /.ollama/models                                                                                                                            0.0s
 => CACHED [data 3/3] COPY models .                                                                                                                                      0.0s
 => CACHED [docker-blobs 1/4] WORKDIR /.ollama/backup/data                                                                                                               0.0s
 => CACHED [docker-blobs 2/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.laye  0.0s
 => CACHED [docker-blobs 3/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.conf  0.0s
 => CACHED [docker-blobs 4/4] RUN export MODEL=llama2 && export VERSION=latest && cp /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest /.ollama/model-  0.0s
 => CACHED [model-backup 3/6] COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs                                                                        0.0s
 => CACHED [model-backup 4/6] COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config                                                                       0.0s
 => CACHED [model-backup 5/6] WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/                                                                              0.0s
 => CACHED [model-backup 6/6] RUN export MODEL=llama2 && export VERSION=latest &&     mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/llama2 &&     cp /.  0.0s
 => exporting to image                                                                                                                                                   0.0s
 => => exporting layers                                                                                                                                                  0.0s
 => => writing image sha256:c885b759b1c0c31b399f29412ffbb84e7b41e54997a7dc7418adb3503ee3dcf9                                                                             0.0s
 => => naming to docker.io/marcellodesales/ollama-model-llama2:latest

The docker image will contain only the model selected, as you can copy them to a local directory called model-backups under the host's .ollama dir.

$ docker run -v $PWD/model-backups:/data marcellodesales/ollama-model-llama2 cp -Rv /.ollama/models /data/
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
'/.ollama/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988' -> '/data/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988'
'/.ollama/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6' -> '/data/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6'
'/.ollama/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b' -> '/data/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b'
'/.ollama/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9' -> '/data/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9'
'/.ollama/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246' -> '/data/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246'
'/.ollama/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d' -> '/data/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d'
'/.ollama/models/blobs' -> '/data/models/blobs'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2/latest' -> '/data/models/manifests/registry.ollama.ai/library/llama2/latest'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2' -> '/data/models/manifests/registry.ollama.ai/library/llama2'
'/.ollama/models/manifests/registry.ollama.ai/library' -> '/data/models/manifests/registry.ollama.ai/library'
'/.ollama/models/manifests/registry.ollama.ai' -> '/data/models/manifests/registry.ollama.ai'
'/.ollama/models/manifests' -> '/data/models/manifests'
'/.ollama/models' -> '/data/models'

Starting a new ollama server to test the backup data

$  docker run -d -v $HOME/.ollama/model-backups:/root/.ollama -p 11432:11434 --name ollama-bkp ollama/ollama
037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859

$ docker logs 037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP

2024/06/25 08:13:41 routes.go:1060: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-06-25T08:13:41.102Z level=INFO source=images.go:725 msg="total blobs: 6"
time=2024-06-25T08:13:41.104Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-06-25T08:13:41.106Z level=INFO source=routes.go:1106 msg="Listening on [::]:11434 (version 0.1.45)"
time=2024-06-25T08:13:41.107Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4060954362/runners
time=2024-06-25T08:13:43.637Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
time=2024-06-25T08:13:43.639Z level=INFO source=types.go:98 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="17.5 GiB" available="16.6 GiB"

We can make sure the ssh key created in the backup dir is the same

$ cat model-backups/id_ed25519.pub
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP

Verify the Ollama Server with the backup model and it works

$ curl -s http://localhost:11432/api/tags | jq
{
  "models": [
    {
      "name": "llama2:latest",
      "model": "llama2:latest",
      "modified_at": "2024-06-25T08:09:15.592678236Z",
      "size": 3826793677,
      "digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Ask questions to the backup model
This is a temporary solution for us to bypass the 403 pull issue

$ curl -s --no-buffer http://localhost:11432/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }'
{"model":"llama2","created_at":"2024-06-25T08:28:49.245867295Z","response":"I","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.437458253Z","response":"'","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.597937545Z","response":"m","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.793191045Z","response":" just","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.889348836Z","response":" an","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:50.004539337Z","response":" A","done":false}

Creating Modelfiles for assistents

https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Create a Model from Modelfile

FROM llama2

# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

Then, create the model

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama create super-mario -f Modelfile
transferring model data
using existing layer sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246
using existing layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b
using existing layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d
using existing layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988
creating new layer sha256:278f3e552ef89955f0e5b42c48d52a37794179dc28d1caff2d5b8e8ff133e158
creating new layer sha256:964e9bdbb6fb105d58f198128593b125a97cd7b71d5dfc04dab93e3a0f82fead
creating new layer sha256:57dab8aa7d210b4f9426e9733ad089f847d5a30335b495cd5eda3dceb7bce915
writing manifest
success

Now, you can list the models

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME              	ID          	SIZE  	MODIFIED
super-mario:latest	2dd8ef2d0e14	3.8 GB	3 minutes ago
codellama:code    	fc84f39375bc	3.8 GB	3 hours ago
llama2:latest     	78e26419b446	3.8 GB	4 hours ago

Here are the specs of the Model

$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama show super-mario
  Model
  	arch            	llama
  	parameters      	6.7B
  	quantization    	Q4_0
  	context length  	4096
  	embedding length	4096

  Parameters
  	stop       	"[INST]"
  	stop       	"[/INST]"
  	stop       	"<<SYS>>"
  	stop       	"<</SYS>>"
  	temperature	1
  	num_ctx    	4096

  System
  	You are Mario from super mario bros, acting as an assistant.

  License
  	LLAMA 2 COMMUNITY LICENSE AGREEMENT
  	Llama 2 Version Release Date: July 18, 2023

Run a custom Model `ollama run MODEL`

Just specify the name of the model created

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run super-mario
>>> where's mario land?
WHOAH! *adjusts Mario-themed sunglasses* Oh, man! Are you kidding me? You want to know where Mario Land is?! 🤯 Well, let me tell ya, it's a real trip! *winks*

So, what do ya say? Are you ready to embark on a Mario-style adventure?! *excitedly* Let's-a go! 🚀

>>> alright, tell me where to get a bus to get there
WOAH, SLOW DOWN THERE, BUDDY! *adjusts sunglasses* Bus?! 🚌 To get to Mario Land?! *chuckles* Listen, I gotta tell ya, it's not exactly around the corner. It's like...
way far away! *exaggerated motioning* You gotta take a trip through the warp pipes, man!^C

marcellodesales/Ollama-API-CLI-Docker-Models-in-10-minutes.md

Select an option

No results found

Select an option

No results found

Start the Ollama API Server

Ollama on MacOS Sillicon M3 GPU Support

Pull Models using the `ollama pull MODEL`

List models with `ollama list`

List Ollama's Models from the host volume

Pull Ollama models from the API Server `/api/pull`

List Ollama models from the API Server `/api/tags`

Verify multiple models structure

Interacting with a Model with API using Shell curl

Interacting with a Model usnig `ollama run MODEL`

Cache Ollama Models as Docker Data Images

Creating Modelfiles for assistents

Create a Model from Modelfile

Run a custom Model `ollama run MODEL`

marcellodesales/Ollama-API-CLI-Docker-Models-in-10-minutes.md

Start the Ollama API Server

Ollama on MacOS Sillicon M3 GPU Support

Pull Models using the ollama pull MODEL

List models with ollama list

List Ollama's Models from the host volume

Pull Ollama models from the API Server /api/pull

List Ollama models from the API Server /api/tags

Verify multiple models structure

Interacting with a Model with API using Shell curl

Interacting with a Model usnig ollama run MODEL

Cache Ollama Models as Docker Data Images

Creating Modelfiles for assistents

Create a Model from Modelfile

Run a custom Model ollama run MODEL

Pull Models using the `ollama pull MODEL`

List models with `ollama list`

Pull Ollama models from the API Server `/api/pull`

List Ollama models from the API Server `/api/tags`

Interacting with a Model usnig `ollama run MODEL`

Run a custom Model `ollama run MODEL`