title | marimo-version | width |
---|---|---|
Local Llm With Ollama |
0.9.1 |
medium |
import marimo as mo
import llm
import markdown
import rich
This is not too complicated though. If you visit the Ollama website, you can download the installer.
You then need to run the following command in your terminal to fetch the 2Gb model.
ollama pull llama3.2
Once you have that llm by default will not know to use it. The simplest way, assuming you're not already using llm is to set it as the default, instead of OpenAI:
llm models default llama3.2
Then you can proceed with the other steps
model = llm.get_model()
model
Marimo has a nifty chat widget, that lets you use the llm
package as the endpoint.
To use it, you need to pass in a callable, like a function that accepts a list of messages, and then returns a string of rendered text that represents the last response from the LLM.
So, that's what rendered_chat()
below does.
While the returned text from the LLM is markdown, it still needs to be rendered. This is why we call the markdown()
to turn the markdown text into html, to render on the page.
conversation = model.conversation()
def rendered_chat(messages, config):
"""
A function to call to return answer to the last prompt.
Returns an llm.models.Response object
"""
content = conversation.prompt(messages[-1].content)
# TODO: not every element in markdown appears to be rendered right
# see why this is in future
rendered_markdown = markdown.markdown(content.text())
return rendered_markdown
chat = mo.ui.chat(rendered_chat)
chat