Skip to content

Instantly share code, notes, and snippets.

LLaMA-2 QLoRA Fine-Tuning Walkthrough

This document explains in detail how we fine-tune the NousResearch/Llama-2-7b-chat-hf model on a financial tweet sentiment dataset using the QLoRA method. The training is done using Hugging Face Transformers, PEFT (LoRA), and bitsandbytes for 4-bit quantization.


📂 Model & Dataset

model_name = "NousResearch/Llama-2-7b-chat-hf"

LoRA vs Full Fine-Tuning vs QLoRA

Real-World Scenario

Task: Fine-tune a base language model to act as a customer support chatbot for a telecom company.


Full Fine-Tuning

LangGraph State Management Heuristics

This document outlines heuristics and warning signs that indicate when your LangGraph state may be too large or messy to manage effectively.


🔁 1. Cyclic or Deeply Nested State Patterns

  • Sign: Heavy reliance on nested dictionaries/lists (state["a"]["b"]["c"]).
  • Why it matters: Difficult to track updates, read, and debug.
  • Heuristic: More than 2–3 levels of nesting or needing custom update logic indicates excessive complexity.

Choosing Between Agents and Graphs for Dynamic Decision-Making

When designing a system that requires dynamic, adaptive decision-making, the choice between using an agent (e.g., LangChain or OpenAI-style tool-using agent) versus a graph (e.g., LangGraph) depends on several factors.


✅ Use a Graph When:

1. You Have a Known Workflow or Flowchart

  • Tasks follow a sequence with branching logic.

In the context of systems involving large language models (LLMs), the aspect that is typically defined as "artificial intelligence" (AI) depends on the scope and intent of the definition being applied. Here's a breakdown:

1. The LLM Itself

  • Defined as AI: Yes.
  • Reason: A large language model, such as GPT, is considered AI because it demonstrates capabilities associated with intelligent behavior, such as understanding and generating human-like language, summarizing information, answering questions, and even reasoning within probabilistic and learned parameters.

2. The Deterministic State Machine for Preprocessing Input

  • Defined as AI: Not typically.
  • Reason: A deterministic state machine follows fixed, rule-based logic without learning or adapting. It is considered a traditional programming construct, not AI, because it lacks traits like learning, adaptation, or probabilistic inference.

When building a supervisor for a system that includes a large language model (LLM), the preferred approach usually depends on the requirements for reliability, transparency, and control.

Preferred Approach: Deterministic Logic Outside the Graph

  • Reason: Deterministic logic offers predictable and debuggable behavior, which is crucial when supervising, orchestrating, or enforcing policies around the use of LLMs.
  • Use Cases: Supervising task delegation, routing, retries, safety checks, compliance enforcement, logging, and fallback mechanisms.
  • Advantages:
    • Easier to test and verify.
    • Transparent decision-making.
    • Easier integration with existing systems.

When designing agent specialization through system prompts, the choice between hardcoding prompts at setup and dynamically generating them at runtime often depends on the required flexibility, interpretability, and control.

Strategy 1: Hardcoded System Prompts (Static)

  • Use Case: When agents serve stable, well-defined roles.
  • Advantages:
    • Easy to audit and debug.
    • Predictable behavior.
    • Simpler deployment.
  • Disadvantages:

Using LangSmith with LangGraph enables robust observability in multi-agent systems, especially when teams (like research teams using tools such as Tavily) are coordinated via a supervisor agent. Observability ensures transparency, debuggability, and traceability of workflows across the graph.


🧠 Why Use Observability in a LangGraph?

  • Debug execution flows across agents and tools.
  • Visualize agent decisions, tool calls, and responses.
  • Track latency, success/failure, and metadata.
  • Compare behavior across runs or task types.

LangGraph is absolutely usable in user-facing applications — but certain patterns and architectural strategies help make it more responsive. When full runs take upwards of 76 seconds, the key is handling perceived latency through streaming, asynchronous execution, or background task management.


⏱️ Is LangGraph Too Slow for UX?

No, but raw sequential execution without streaming or feedback can lead to poor UX. For responsive UIs, consider:

  • Streaming partial results (especially from LLMs)
  • Background execution + progress polling

In a LangGraph-based multi-agent setup, when a researcher agent produces an output, this output is passed into a supervisor agent. The supervisor uses the output to determine which edge of the graph to traverse next. This often involves wrapping the researcher's output in a structured message or passing it as part of a system prompt.


🔄 How Context Propagation Works

  • Step 1: Researcher agent queries tools (e.g., Tavily) and returns a summary.
  • Step 2: This summary becomes the input context for the supervisor.
  • Step 3: The supervisor LLM is prompted with this context + a system prompt asking for decision routing.
  • Step 4: Based on reasoning, the supervisor picks the next step in the graph.