Skip to content

Instantly share code, notes, and snippets.

@tjwebb
Created February 8, 2025 21:20
Show Gist options
  • Save tjwebb/1fc6a50c83b8f7324c5b14123c4baffd to your computer and use it in GitHub Desktop.
Save tjwebb/1fc6a50c83b8f7324c5b14123c4baffd to your computer and use it in GitHub Desktop.
Ollama + Vision Example
from langchain_ollama import OllamaLLM
from langchain_core.messages import HumanMessage, SystemMessage
import base64
text_prompt = """
You are a robot for a homeowners insurance underwriter.
You observe and record the physical characteristics of residential property from aerial imagery.
Response Format: JSON
Fill out the following JSON object that contains the following attributes of the given image.
Do not include any other text. Do not include introduction or explanation. Return JSON only.
For the “hazards” field, include anything that could potentially damage the roof or surrounding property.
Return: object
Fields:
roof_color: <string>
roof_shape: <string>
roof_primary_material: <string>
roof_architecture_style: <string>
roof_maintenance_condition: <string>
hazards: [ <string> ]
roof_chimneys: <integer>
roof_dormers: <integer>
"""
with open('testhouse.png', 'rb') as test_image:
#image_data = base64.b64encode(test_image.read())
image_data = base64.b64encode(test_image.read()).decode('utf-8')
model = OllamaLLM(model="llama3.2-vision:11b-instruct-fp16")
model_with_image = model.bind(images=[image_data])
message = HumanMessage(
content=[
{"type": "text", "text": "describe this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image;base64,{image_data}"},
},
],
)
#print(message)
print(model_with_image.invoke(text_prompt))
@tjwebb
Copy link
Author

tjwebb commented Feb 9, 2025

Works every time with this model. (Note that lines 36-44 are not used. I could not get the image prompt to work this way)

With other models like minicpm-v it's less reliable and really wants to be chatty before/after the json response

@bmf
Copy link

bmf commented Feb 10, 2025

You might want to look at Pydantic for getting consistent output. This video talks about using it with CrewAI, but it should be pretty close to what you are doing here. It might work with your smaller model.

@tjwebb
Copy link
Author

tjwebb commented Feb 10, 2025

@tjwebb
Copy link
Author

tjwebb commented Feb 10, 2025

I guess I don't really need langchain here...

@bmf
Copy link

bmf commented Feb 10, 2025

If you're suggesting that you're considering CrewAI, it's pretty straight forward. Although, I do believe it uses langchain under the hood (or at least it used to). Pydantic works well with it. I haven't had much luck with local models, but that's a hardware issue on my end. The OpenAI API has made it pretty easy to get around that though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment