- I have a large machine with 2 GPUs and a considerable amount of RAM.
- I was trying to use ollama to server
llava
andmistral
BUT it would reload the models every time I switched model requests. - So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.
- I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
- Linux (At least on my linux machine) -
/usr/share/ollama/.ollama