- API: https://github.com/ollama/ollama/blob/main/docs/api.md
- Docker: https://github.com/ollama/ollama/blob/main/docs/docker.md
Warning
Running from a docker container in MacOS M3 with GPUs won't work from Docker
$ docker run -d -v $HOME/.ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
9d2823c57c7ebc57c47456afa05e74a915c64c59c2904b7b5b4cc60a238f1b02
- If you look at the logs, you will notice the CPUs are used
...
...
ollama-server | time=2025-01-11T01:17:34.798Z level=INFO source=routes.go:1310 msg="Listening on [::]:11434 (version 0.5.4-0-g2ddc32d-dirty)"
ollama-server | time=2025-01-11T01:17:34.799Z level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners="[cpu cuda_jetpack5 cuda_jetpack6 cuda_v11 cuda_v12]"
ollama-server | time=2025-01-11T01:17:34.799Z level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
ollama-server | time=2025-01-11T01:17:34.800Z level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"
ollama-server | time=2025-01-11T01:17:34.801Z level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant="no vector extensions" compute="" driver=0.0 name="" total="17.5 GiB" available="15.9 GiB"
- Running Ollama natively works on M3, but you must run it without docker
Warning
You must install ollama from their website
$ OLLAMA_HOST=0.0.0.0:11234 ollama serve
...
...
time=2025-01-10T17:16:24.333-08:00 level=INFO source=routes.go:1310 msg="Listening on [::]:11234 (version 0.5.4)"
time=2025-01-10T17:16:24.333-08:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners=[metal]
time=2025-01-10T17:16:24.383-08:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="36.0 GiB" available="36.0 GiB"
- Ollama Versions: https://hub.docker.com/r/ollama/ollama/tags
- This is very similar to the Docker Image format, but it uses Ollama's image structure
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull llama2
pulling manifest
pulling 8934d96d3f08... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 59 B
pulling fa304d675061... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 91 B
pulling 42ba7f8a01dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME ID SIZE MODIFIED
llama2:latest 78e26419b446 3.8 GB 40 minutes ago
- Ollama registry dir locally, which is mapped to the container volume
$ find ~/.ollama -name llama2
/Users/marcellodesales/.ollama/models/manifests/registry.ollama.ai/library/llama2
- Verify the server's port numbers, etc
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
13354dd76274 ollama/ollama "/bin/ollama serve" 30 hours ago Up 30 hours 0.0.0.0:11434->11434/tcp ollama
- NOTE: This should be a protected API with Oauth if you want to make this production-ready or disable it altogether.
- NOTE2: this is a very slow operation due to the output of the json response... This is really if you are building a UI for this
$ curl -X POST http://localhost:11434/api/pull -d '{
"name": "llama2:latest"
}'
…
…
…
{"status":"downloading sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","digest":"sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","total":105}
{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529}
{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529,"completed":529}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"removing any unused layers"}
{"status":"success"}
- Verify the list of models
$ curl -s http://localhost:11434/api/tags | jq
{
"models": [
{
"name": "llama2:latest",
"model": "llama2:latest",
"modified_at": "2024-06-25T08:09:15.592678236Z",
"size": 3826793677,
"digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "7B",
"quantization_level": "Q4_0"
}
}
]
}
- Which is compatible with the output of the CLI
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME ID SIZE MODIFIED
llama2:latest 78e26419b446 3.8 GB 40 minutes ago
- Just using the CLI inside the docker image itself (entrypoint)
- Docker Blobs are downloaded into the blobs dir, as well as their manifests
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull codellama:code
$ cat models/manifests/registry.ollama.ai/library/codellama/code
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json",
"config":{"mediaType":"application/vnd.docker.container.image.v1+json",
"digest":"sha256:23fbdb4ea003a1e1c38187539cc4cc8e85c6fb80160a659e25894ca60e781a33",
"size":455},"layers":[{"mediaType":"application/vnd.ollama.image.model",
"digest":"sha256:8b2eceb7b7a11c307bc9deed38b263e05015945dc0fa2f50c0744c5d49dd293e",
"size":3825898144},{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b","size":7020},
{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:590d74a5569b8a20eb2a8b0aa869d1d1d3faf6a7fdda1955ae827073c7f502fc","size":4790},
{"mediaType":"application/vnd.ollama.image.params","digest":
"sha256:d2b44be9e12117ee2652e9a6c51df28ef408bf487e770b11ee0f7bce8790f3ca","size":31}]}
- Listing the models again will show the models available
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME ID SIZE MODIFIED
codellama:code fc84f39375bc 3.8 GB 3 minutes ago
llama2:latest 78e26419b446 3.8 GB 50 minutes ago
$ ls -la ~/.ollama/models/manifests/registry.ollama.ai/library/
total 0
drwxr-xr-x 4 marcellodesales staff 128 Jun 24 23:10 .
drwxr-xr-x 3 marcellodesales staff 96 Jun 24 22:23 ..
drwxr-xr-x 3 marcellodesales staff 96 Jun 24 23:10 codellama
drwxr-xr-x 3 marcellodesales staff 96 Jun 24 22:23 llama2
- Just seeing the same API contract for the model and prompt
$ curl -i -X POST http://localhost:11434/api/generate -d '{
"model": "llama2:7b",
"prompt": "Why is the sky blue?"
}'
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Wed, 08 Nov 2023 18:59:09 GMT
Transfer-Encoding: chunked
{"model":"llama2","created_at":"2023-11-08T18:59:09.464416338Z","response":"\n","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:14.982356591Z","response":"The","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:19.764668885Z","response":" sky","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:24.59062022Z","response":" appears","done":false}
^C
- Given the structure, we can collect the stream to output for a UI however we like
- Say, create a file say
curl-api-prompt.sh
using curl
counter=0;
sentences="";
final=""
curl -s --no-buffer http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }' | \
while read line; do
done=$(echo ${line} | jq -r '.done');
echo "**** Current line is ${line}";
if [ "${done}" == "true" ]; then
break
else
response=$(echo ${line} | jq -r '.response');
if [ "${response}" = "n" ]; then
echo "" >> response.txt;
echo "" >> response.txt;
echo "RESPONSE SO FAR $(cat response.txt)"
echo ""
fi
echo -n "${response}" >> response.txt
counter=$((counter + ${#sentences}));
if (( counter >= 600 )); then
counter=0;
sentences="";
echo "" >> response.txt
echo "" >> response.txt
fi
fi
done
echo ""
echo "!!!! Complete Message: $(cat response.txt)"
- Then, ask a question to the API server
$ bash curl-api-prompt.sh
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.141035794Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.245325002Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.765266961Z","response":"m","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.935886877Z","response":" just","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.049312086Z","response":" an","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.697848878Z","response":" A","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.818389878Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.338368878Z","response":",","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.71665142Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.813270087Z","response":" don","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.934463295Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.034801587Z","response":"t","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.181949795Z","response":" have","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.49937242Z","response":" personal","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.746886295Z","response":" prefer","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.836084212Z","response":"ences","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.140950754Z","response":" or","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.578687421Z","response":" use","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.780265962Z","response":" specific","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.877094587Z","response":" models","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.025753879Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.256480546Z","response":" My","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.478015629Z","response":" responses","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.778192463Z","response":" are","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.961603338Z","response":" generated","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:52.061262421Z","response":" based","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:53.737052422Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.020325839Z","response":" patterns","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.511548506Z","response":" and","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.134965173Z","response":" relationships","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.786850465Z","response":" in","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.193616549Z","response":" language","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.746161299Z","response":" that","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:58.463891424Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:59.08879705Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.1473888Z","response":"ve","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.881787467Z","response":" been","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:01.673808843Z","response":" trained","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:02.306899885Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.007693135Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.490258844Z","response":" Is","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:04.058347719Z","response":" there","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:05.832360178Z","response":" something","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.234360971Z","response":" else","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.920473721Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.05374643Z","response":" can","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.974550722Z","response":" help","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:11.604816958Z","response":" with","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:12.53442975Z","response":"?","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:15.15323871Z","response":"","done":true,"done_reason":"stop","context":[518,25580,29962,3532,14816,29903,29958,5299,829,14816,29903,6778,13,13,5816,1904,
526,366,773,29973,518,29914,25580,29962,13,29902,29915,29885,925,385,319,29902,29892,
306,1016,29915,29873,505,7333,5821,2063,470,671,2702,4733,29889,1619,20890,526,5759,
2729,373,15038,322,21702,297,4086,393,306,29915,345,1063,16370,373,29889,1317,727,1554,
1683,306,508,1371,411,29973],"total_duration":29212813305,"load_duration":4693792,
"prompt_eval_duration":152269000,"eval_count":50,"eval_duration":29013184000}
!!!! Complete Message: I'm just an AI, I don't have personal preferences or use
specific models. My responses are generated based on patterns and relationships
in language that I've been trained on. Is there something else I can help with?
- Just run the docker image with the command, pointing to the server that's running
- For testing, just map the local network. Ollama CLI will use the default port number to communicate with the server
- As we have a server running from the first commands above, the cli will submit requests to the ollama server
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run llama2
>>> can we use this model for generating pdfs?
Yes, it is possible to use the transformer model for generating PDFs. In fact, there are several papers that have proposed using transformer models for generating PDFs,
including:
1. "Transformers for Generating Probabilistic Differential Equations" by J. P. Riley and A. M. C. S. Sousa (2020)
2. "A Transformer-Based Model for Generating Differential Equations" by H. Yu, et al. (2019)
3. "Generative Models^C
>>>
Use Ctrl + d or /bye to exit.
NOTE: this is a hack
- I've created this method to bypass the 403 problems from ollama/ollama#676 (comment)
-
That is, instead of using Ollama pull command, I can backup any more, pulled from the registry
-
Using an old method to cache data images, create a Dockerfile under the Ollama's data dir
- This is to bypass the problems about proxies, as we have everything cloud-native already.
-
Drop this Dockerfile under the ~/.ollama directory of your server to backup a model
ARG MODEL
ARG VERSION
FROM cfmanteiga/alpine-bash-curl-jq AS data
WORKDIR /.ollama/models
COPY models .
FROM data AS docker-blobs
ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION
WORKDIR /.ollama/backup/data
RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.layers[].digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g | tr '\n' '\0' | xargs -Ifile cp file /.ollama/backup/data
RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.config.digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g | xargs -Ifile cp file /.ollama/backup/data
RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cp /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} /.ollama/model-config
FROM busybox AS model-backup
WORKDIR /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config
ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION
WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/
RUN export MODEL=${MODEL} && export VERSION=${VERSION} && \
mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/${MODEL} && \
cp /.ollama/model-config /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION}
- Build the data image with the model you need backus from
- Push it as well if needed...
$ docker buildx build --platform=linux/amd64 --tag marcellodesales/ollama-model-llama2:latest
--build-arg VERSION=latest --build-arg MODEL=llama2 --target model-backup .
[+] Building 0.8s (19/19) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.55kB 0.0s
=> [internal] load metadata for docker.io/cfmanteiga/alpine-bash-curl-jq:latest 0.8s
=> [internal] load metadata for docker.io/library/busybox:latest 0.8s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [data 1/3] FROM docker.io/cfmanteiga/alpine-bash-curl-jq:latest@sha256:e09a3d5d52abb27830b44a2c279d09be66fad5bf476b3d02fb4a4a6125e377fc 0.0s
=> [model-backup 1/6] FROM docker.io/library/busybox:latest@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 1.55kB 0.0s
=> CACHED [model-backup 2/6] WORKDIR /.ollama/models/blobs 0.0s
=> CACHED [data 2/3] WORKDIR /.ollama/models 0.0s
=> CACHED [data 3/3] COPY models . 0.0s
=> CACHED [docker-blobs 1/4] WORKDIR /.ollama/backup/data 0.0s
=> CACHED [docker-blobs 2/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.laye 0.0s
=> CACHED [docker-blobs 3/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.conf 0.0s
=> CACHED [docker-blobs 4/4] RUN export MODEL=llama2 && export VERSION=latest && cp /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest /.ollama/model- 0.0s
=> CACHED [model-backup 3/6] COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs 0.0s
=> CACHED [model-backup 4/6] COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config 0.0s
=> CACHED [model-backup 5/6] WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/ 0.0s
=> CACHED [model-backup 6/6] RUN export MODEL=llama2 && export VERSION=latest && mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/llama2 && cp /. 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:c885b759b1c0c31b399f29412ffbb84e7b41e54997a7dc7418adb3503ee3dcf9 0.0s
=> => naming to docker.io/marcellodesales/ollama-model-llama2:latest
- The docker image will contain only the model selected, as you can copy them to a local directory called
model-backups
under the host's .ollama dir.
$ docker run -v $PWD/model-backups:/data marcellodesales/ollama-model-llama2 cp -Rv /.ollama/models /data/
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
'/.ollama/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988' -> '/data/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988'
'/.ollama/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6' -> '/data/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6'
'/.ollama/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b' -> '/data/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b'
'/.ollama/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9' -> '/data/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9'
'/.ollama/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246' -> '/data/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246'
'/.ollama/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d' -> '/data/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d'
'/.ollama/models/blobs' -> '/data/models/blobs'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2/latest' -> '/data/models/manifests/registry.ollama.ai/library/llama2/latest'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2' -> '/data/models/manifests/registry.ollama.ai/library/llama2'
'/.ollama/models/manifests/registry.ollama.ai/library' -> '/data/models/manifests/registry.ollama.ai/library'
'/.ollama/models/manifests/registry.ollama.ai' -> '/data/models/manifests/registry.ollama.ai'
'/.ollama/models/manifests' -> '/data/models/manifests'
'/.ollama/models' -> '/data/models'
- Starting a new ollama server to test the backup data
$ docker run -d -v $HOME/.ollama/model-backups:/root/.ollama -p 11432:11434 --name ollama-bkp ollama/ollama
037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859
$ docker logs 037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP
2024/06/25 08:13:41 routes.go:1060: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-06-25T08:13:41.102Z level=INFO source=images.go:725 msg="total blobs: 6"
time=2024-06-25T08:13:41.104Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-06-25T08:13:41.106Z level=INFO source=routes.go:1106 msg="Listening on [::]:11434 (version 0.1.45)"
time=2024-06-25T08:13:41.107Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4060954362/runners
time=2024-06-25T08:13:43.637Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
time=2024-06-25T08:13:43.639Z level=INFO source=types.go:98 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="17.5 GiB" available="16.6 GiB"
- We can make sure the ssh key created in the backup dir is the same
$ cat model-backups/id_ed25519.pub
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP
- Verify the Ollama Server with the backup model and it works
$ curl -s http://localhost:11432/api/tags | jq
{
"models": [
{
"name": "llama2:latest",
"model": "llama2:latest",
"modified_at": "2024-06-25T08:09:15.592678236Z",
"size": 3826793677,
"digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "7B",
"quantization_level": "Q4_0"
}
}
]
}
- Ask questions to the backup model
- This is a temporary solution for us to bypass the 403 pull issue
$ curl -s --no-buffer http://localhost:11432/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }'
{"model":"llama2","created_at":"2024-06-25T08:28:49.245867295Z","response":"I","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.437458253Z","response":"'","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.597937545Z","response":"m","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.793191045Z","response":" just","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.889348836Z","response":" an","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:50.004539337Z","response":" A","done":false}
FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096
# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.
- Then, create the model
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) \
-v $HOME/.ollama:/root/.ollama ollama/ollama create super-mario -f Modelfile
transferring model data
using existing layer sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246
using existing layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b
using existing layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d
using existing layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988
creating new layer sha256:278f3e552ef89955f0e5b42c48d52a37794179dc28d1caff2d5b8e8ff133e158
creating new layer sha256:964e9bdbb6fb105d58f198128593b125a97cd7b71d5dfc04dab93e3a0f82fead
creating new layer sha256:57dab8aa7d210b4f9426e9733ad089f847d5a30335b495cd5eda3dceb7bce915
writing manifest
success
- Now, you can list the models
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) \
-v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME ID SIZE MODIFIED
super-mario:latest 2dd8ef2d0e14 3.8 GB 3 minutes ago
codellama:code fc84f39375bc 3.8 GB 3 hours ago
llama2:latest 78e26419b446 3.8 GB 4 hours ago
- Here are the specs of the Model
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) \
-v $HOME/.ollama:/root/.ollama ollama/ollama show super-mario
Model
arch llama
parameters 6.7B
quantization Q4_0
context length 4096
embedding length 4096
Parameters
stop "[INST]"
stop "[/INST]"
stop "<<SYS>>"
stop "<</SYS>>"
temperature 1
num_ctx 4096
System
You are Mario from super mario bros, acting as an assistant.
License
LLAMA 2 COMMUNITY LICENSE AGREEMENT
Llama 2 Version Release Date: July 18, 2023
- Just specify the name of the model created
$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run super-mario
>>> where's mario land?
WHOAH! *adjusts Mario-themed sunglasses* Oh, man! Are you kidding me? You want to know where Mario Land is?! 🤯 Well, let me tell ya, it's a real trip! *winks*
So, what do ya say? Are you ready to embark on a Mario-style adventure?! *excitedly* Let's-a go! 🚀
>>> alright, tell me where to get a bus to get there
WOAH, SLOW DOWN THERE, BUDDY! *adjusts sunglasses* Bus?! 🚌 To get to Mario Land?! *chuckles* Listen, I gotta tell ya, it's not exactly around the corner. It's like...
way far away! *exaggerated motioning* You gotta take a trip through the warp pipes, man!^C