Joshua Knox jrknox1977

Multiple Ollama Containers on a single host (with multiple GPUs)

I have a large machine with 2 GPUs and a considerable amount of RAM.
I was trying to use ollama to server llava and mistral BUT it would reload the models every time I switched model requests.
So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.

I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
Linux (At least on my linux machine) - /usr/share/ollama/.ollama

	# install DSPy: pip install dspy
	import dspy


	# This sets up the language model for DSPy in this case we are using GPT-3.5-turbo
	turbo = dspy.OpenAI(model='gpt-3.5-turbo')

	# This sets the language model for DSPy. This must be set or you get an error that is not helpful:
	# --> temperature = lm.kwargs['temperature'] if temperature is None else temperature
	# --> AttributeError: 'NoneType' object has no attribute 'kwargs'

	# install DSPy: pip install dspy
	import dspy


	# This sets up the language model for DSPy in this case we are using GPT-3.5-turbo
	turbo = dspy.OpenAI(model='gpt-3.5-turbo')

	# This sets the language model for DSPy. This must be set or you get an error that is not helpful:
	# --> temperature = lm.kwargs['temperature'] if temperature is None else temperature
	# --> AttributeError: 'NoneType' object has no attribute 'kwargs'

	# install DSPy: pip install dspy
	import dspy

	# This sets up the language model for DSPy in this case we are using mistral 7b through TGI (Text Generation Interface from HuggingFace)
	mistral = dspy.HFClientTGI(model='mistralai/Mistral-7B-v0.1', port=8080, url='http://localhost')

	# This sets the language model for DSPy.
	dspy.settings.configure(lm=mistral)

	# This is not required but it helps to understand what is happening

	# install DSPy: pip install dspy
	import dspy

	# Ollam 1.24 is now compatible with OpenAI APIs
	# But DSPy has hard coded some logic around the names of the OpenAI models
	# This is a workaround for now.
	# You have to create a custom Ollama Model and use the name 'gpt-3.5-turbo' or at least 'gpt-3.5'
	#
	# Here is the DSPy code that is causing the issue:
	#

	# install DSPy: pip install dspy
	import dspy

	# Ollam is now compatible with OpenAI APIs
	#
	# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
	# If you do not include this you will get an error.
	#
	# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
	# At least with mistral.

	# YouTube Video Analysis with Claude 3
	# By KNOX @jr_knox1977
	#
	# Prompt "inpired" by several prompts from Daniel Meissler's Fabric Project:
	# https://github.com/danielmiessler/fabric
	#
	# This script expects a .env file with the following content:
	# CLAUDE_API_KEY=your_api
	#
	# Quick pip: python3 -m pip install anthropic youtube_transcript_api python-dotenv

	import os
	from dotenv import load_dotenv
	from openai import OpenAI
	from groq import Groq
	import anthropic
	import google.generativeai as genai

	# Use this pip install command:
	# python3 -m pip install openai groq anthropic google-generativeai python-dotenv