-
-
Save tjwebb/1fc6a50c83b8f7324c5b14123c4baffd to your computer and use it in GitHub Desktop.
from langchain_ollama import OllamaLLM | |
from langchain_core.messages import HumanMessage, SystemMessage | |
import base64 | |
text_prompt = """ | |
You are a robot for a homeowners insurance underwriter. | |
You observe and record the physical characteristics of residential property from aerial imagery. | |
Response Format: JSON | |
Fill out the following JSON object that contains the following attributes of the given image. | |
Do not include any other text. Do not include introduction or explanation. Return JSON only. | |
For the “hazards” field, include anything that could potentially damage the roof or surrounding property. | |
Return: object | |
Fields: | |
roof_color: <string> | |
roof_shape: <string> | |
roof_primary_material: <string> | |
roof_architecture_style: <string> | |
roof_maintenance_condition: <string> | |
hazards: [ <string> ] | |
roof_chimneys: <integer> | |
roof_dormers: <integer> | |
""" | |
with open('testhouse.png', 'rb') as test_image: | |
#image_data = base64.b64encode(test_image.read()) | |
image_data = base64.b64encode(test_image.read()).decode('utf-8') | |
model = OllamaLLM(model="llama3.2-vision:11b-instruct-fp16") | |
model_with_image = model.bind(images=[image_data]) | |
message = HumanMessage( | |
content=[ | |
{"type": "text", "text": "describe this image"}, | |
{ | |
"type": "image_url", | |
"image_url": {"url": f"data:image;base64,{image_data}"}, | |
}, | |
], | |
) | |
#print(message) | |
print(model_with_image.invoke(text_prompt)) |
Works every time with this model. (Note that lines 36-44 are not used. I could not get the image prompt to work this way)
With other models like minicpm-v it's less reliable and really wants to be chatty before/after the json response
You might want to look at Pydantic for getting consistent output. This video talks about using it with CrewAI, but it should be pretty close to what you are doing here. It might work with your smaller model.
Yea ollama supports this but I don't think their langchain module supports it yet (https://python.langchain.com/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html#langchain_ollama.llms.OllamaLLM.with_structured_output)
I guess I don't really need langchain here...
If you're suggesting that you're considering CrewAI, it's pretty straight forward. Although, I do believe it uses langchain under the hood (or at least it used to). Pydantic works well with it. I haven't had much luck with local models, but that's a hardware issue on my end. The OpenAI API has made it pretty easy to get around that though.
Did this work well for you? Did you have any problems with the model filling out the JSON object?