Lab 1 introduces local LLM execution using Ollama, demonstrating both interactive CLI and API-based interaction.
graph TB
subgraph "Local Machine"
subgraph "Ollama Service :11434"
OllamaServer[Ollama Server HTTP API]
ModelStore[Model Storage llama3.2]
end
subgraph "User Interactions"
CLI[CLI Interface ollama run]
Python[Python Program simple_ollama.py]
CURL[API Client curl/HTTP]
end
subgraph "Python Stack"
LangChain[LangChain-Ollama Library]
ChatOllama[ChatOllama Class]
Python --> LangChain
LangChain --> ChatOllama
end
OllamaServer --> ModelStore
CLI --> OllamaServer
CURL --> OllamaServer
ChatOllama -->|HTTP POST :11434/api/chat| OllamaServer
end
Internet[Internet ollama.com] -.->|Pull Models| ModelStore
style OllamaServer fill:#e1f5ff
style ModelStore fill:#fff4e1
style CLI fill:#e8f5e9
style Python fill:#e8f5e9
style LangChain fill:#fff4e1
style ChatOllama fill:#ffe8e8
flowchart LR
User([User]) -->|1. Pull Model| Ollama[Ollama Server :11434]
Ollama -->|2. Store| Models[(llama3.2 Models)]
User -->|3a. CLI Query| Ollama
User -->|3b. API Query| Ollama
Ollama -->|4. Response| User
style Ollama fill:#4CAF50,color:#fff
style Models fill:#2196F3,color:#fff
- Purpose: Local LLM inference engine
- Port: 11434
- Functionality:
- Model management (pull, list, run)
- Inference execution
- API endpoint serving
- Location: Local filesystem (~/.ollama/models)
- Models: llama3.2 (3B parameters)
- Format: GGUF format for efficient inference
- Command:
ollama run llama3.2 - Use Case: Interactive chat, testing
- Protocol: Direct connection to Ollama server
- Endpoint:
http://localhost:11434/api/generate - Method: POST request with JSON payload
- Use Case: Raw API integration
- Library:
langchain-ollama - Class:
ChatOllama - Protocol: HTTP API wrapper with high-level interface
- Use Case: Building AI applications in Python
User → ollama pull → Internet (ollama.com) → Local Storage (~/.ollama/models)
User Input → Ollama CLI → HTTP :11434 → Model → Response → Terminal
sequenceDiagram
participant User
participant Python as Python Program
participant ChatOllama as ChatOllama Class
participant HTTP as HTTP Client
participant Ollama as Ollama Server :11434
participant Model as llama3.2 Model
User->>Python: Run simple_ollama.py
Python->>User: Prompt for input
User->>Python: "What is the capital of France?"
Note over Python,ChatOllama: LangChain Wrapper
Python->>ChatOllama: llm.invoke("What is...")
ChatOllama->>ChatOllama: Format as chat message
Note over ChatOllama,Ollama: HTTP API Call
ChatOllama->>HTTP: POST /api/chat
HTTP->>Ollama: {"model": "llama3.2", "messages": [...]}
Note over Ollama,Model: Local Inference
Ollama->>Model: Load model if needed
Model->>Model: Generate response
Model->>Ollama: "The capital of France is Paris."
Note over Ollama,Python: Response Flow
Ollama->>HTTP: JSON response
HTTP->>ChatOllama: Response data
ChatOllama->>Python: AIMessage object
Python->>User: Print response.content
- Local LLM execution: No cloud dependency, runs on your machine
- Model management: Download, store, and run models via Ollama
- Three interaction patterns: CLI, raw HTTP API, Python/LangChain
- LangChain abstraction: How frameworks simplify LLM integration
- ChatOllama class: Wrapper that handles HTTP communication
- HTTP protocol: Understanding client-server communication
- Inference latency: Typical 2-5 second response time locally
- Message format: How prompts are structured as chat messages
| Method | Complexity | Use Case | Code Example |
|---|---|---|---|
| CLI | Lowest | Testing, exploration | ollama run llama3.2 |
| Raw HTTP | Medium | Custom integrations | curl -X POST http://localhost:11434/api/chat |
| Python/LangChain | Low | AI applications | ChatOllama(model="llama3.2").invoke("...") |
Without LangChain (raw HTTP):
import requests
import json
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "llama3.2",
"messages": [{"role": "user", "content": "What is AI?"}],
"stream": False
}
)
data = response.json()
print(data["message"]["content"])With LangChain (ChatOllama):
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.2")
response = llm.invoke("What is AI?")
print(response.content)Benefits:
- Less boilerplate code
- Automatic error handling
- Consistent interface across different LLM providers
- Built-in support for advanced features (streaming, callbacks, etc.)
- Foundation for building agents (Lab 2+)
LangChain is a framework for building applications with Large Language Models (LLMs). It provides:
- Abstractions: Common interfaces for different LLM providers
- Chains: Connect multiple LLM calls together
- Memory: Maintain conversation context
- Tools: Integrate LLMs with external functions
In Lab 1, we use the langchain-ollama package, which is LangChain's official integration for Ollama.
ChatOllama is a LangChain class that wraps the Ollama HTTP API, providing:
- Simplified API: Clean Python interface instead of raw HTTP
- Message Formatting: Automatically formats prompts as chat messages
- Error Handling: Manages connection issues gracefully
- Type Safety: Returns structured Python objects (AIMessage)
Under the hood, ChatOllama:
- Converts your prompt into Ollama's expected JSON format
- Makes HTTP POST requests to
http://localhost:11434/api/chat - Parses JSON responses into Python objects
- Handles streaming and non-streaming responses
graph TB
subgraph "Your Python Code"
Code[simple_ollama.py]
end
subgraph "LangChain Layer"
ChatOllama[ChatOllama Class]
Formatter[Message Formatter]
HTTPClient[HTTP Client]
ChatOllama --> Formatter
Formatter --> HTTPClient
end
subgraph "Ollama Server"
API[REST API :11434]
Router[Endpoint Router]
Inference[Inference Engine]
API --> Router
Router --> Inference
end
subgraph "Model Layer"
ModelFile[llama3.2 GGUF]
end
Code -->|1. llm.invoke| ChatOllama
HTTPClient -->|2. HTTP POST| API
Inference -->|3. Load| ModelFile
ModelFile -->|4. Generate| Inference
Inference -->|5. JSON| API
API -->|6. Response| HTTPClient
HTTPClient -->|7. AIMessage| Code
style ChatOllama fill:#e1f5ff
style API fill:#fff4e1
style Code fill:#e8f5e9
style ModelFile fill:#ffe8e8
# simple_ollama.py - Basic Ollama interaction
# Step 1: Import LangChain's Ollama integration
from langchain_ollama import ChatOllama
# Step 2: Create ChatOllama instance
# This establishes the connection parameters but doesn't connect yet
llm = ChatOllama(
model="llama3.2", # Which Ollama model to use
base_url="http://localhost:11434", # Ollama server (default)
temperature=0.7 # Controls randomness (0=deterministic, 1=creative)
)
# Step 3: Get user input
user_prompt = input("Enter your prompt: ")
# Step 4: Send to Ollama and get response
# invoke() does the following:
# a. Formats prompt as {"role": "user", "content": "..."}
# b. POSTs to http://localhost:11434/api/chat
# c. Waits for response
# d. Returns AIMessage object
response = llm.invoke(user_prompt)
# Step 5: Display result
# response.content contains the LLM's text response
print(f"\nResponse: {response.content}")flowchart TD
Start([invoke called]) --> Format[Format prompt as chat message]
Format --> Build[Build HTTP request JSON]
Build --> Post[POST to :11434/api/chat]
Post --> Wait[Wait for Ollama response]
Wait --> Parse[Parse JSON response]
Parse --> Create[Create AIMessage object]
Create --> Return([Return AIMessage])
style Start fill:#e8f5e9
style Post fill:#e1f5ff
style Return fill:#fff4e1
HTTP Request Sent:
POST http://localhost:11434/api/chat
{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"stream": false
}HTTP Response Received:
{
"model": "llama3.2",
"created_at": "2025-01-15T10:30:00Z",
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"done": true
}Terminal Session:
$ python simple_ollama.py
Enter your prompt: What is the capital of France?
Response: The capital of France is Paris.Behind the Scenes:
1. User runs Python script
2. ChatOllama instance created (no network call yet)
3. User types prompt
4. invoke() called → HTTP POST to localhost:11434
5. Ollama loads llama3.2 model (if not already loaded)
6. Model generates response (~2-5 seconds)
7. Response returned as JSON
8. ChatOllama parses JSON → AIMessage
9. Python prints response.content
- Type: Standalone local service
- Complexity: Low
- Dependencies: None (self-contained)
- Latency: ~2-5 seconds per query (local CPU/GPU)
For training purposes only. (C) 2025 Tech Skills Transformations and Brent C. Laster - all rights reserved.