Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created April 22, 2025 15:32
Show Gist options
  • Save decagondev/4b0e4fd6c020456addb347833e38d388 to your computer and use it in GitHub Desktop.
Save decagondev/4b0e4fd6c020456addb347833e38d388 to your computer and use it in GitHub Desktop.

πŸ“Š LANGSMITH.md

Overview

LangSmith (by LangChain) and LangFuse are powerful observability and analytics tools for LLM workflows. You can use them to track, analyze, and improve model performance β€” and even synthesize better training data using real user interactions.

This guide covers:

  • How to connect LangSmith or LangFuse
  • How to log successful Q&A pairs
  • How to use logged data for fine-tuning or RAG
  • Sample integration script
  • Dashboard template ideas
  • End-to-end feedback loop

πŸ”Œ Connecting to LangSmith / LangFuse

LangSmith Setup

  1. Install LangChain and LangSmith SDK:
    pip install langchain langsmith
  2. Set environment variables:
    export LANGCHAIN_TRACING_V2=true
    export LANGCHAIN_API_KEY="your_langsmith_api_key"
    export LANGCHAIN_PROJECT="your_project_name"
  3. Wrap your chains:
    from langsmith import traceable
    
    @traceable(name="user_question")
    def answer_question(prompt):
        return llm(prompt)

LangFuse Setup

  1. Install LangFuse SDK:
    npm install langfuse
  2. Use their Node/TS SDK in your backend logic:
    const langfuse = new Langfuse({ publicKey: '...', secretKey: '...' });
    langfuse.log({ trace: { name: "query" }, input: ..., output: ... });

βœ… Tracking Successful Question/Answer Pairs

You can tag data based on feedback:

With LangSmith:

  • Use metadata:
    trace_metadata = {
      "user_feedback": "πŸ‘",
      "task_type": "qa",
      "user_id": "123"
    }
  • Mark high-quality interactions with tags like success=true

With LangFuse:

  • Use feedback tagging or custom properties:
    langfuse.log({
      trace: { name: "answer_check" },
      input: question,
      output: answer,
      metadata: { user_score: 5, was_helpful: true }
    });

πŸ§ͺ Synthesizing Fine-Tuning Data

Once you've tracked a set of high-quality examples:

  1. Export from LangSmith (UI or API):

    • Filter by tag: success=true
    • Extract prompt/response pairs
  2. Format for fine-tuning:

    { "prompt": "How do I reset my password?", "completion": "Go to settings and click 'Reset Password'." }
  3. Use for supervised fine-tuning or to enrich your RAG index.


πŸ’» Sample LangSmith Integration Script

from langchain.chat_models import ChatOpenAI
from langsmith.run_trees import RunTree
from langsmith import traceable

chat = ChatOpenAI()

@traceable(name="chat_response")
def respond_to_user(message: str):
    return chat.predict(message)

# Example usage
response = respond_to_user("What are the store hours?")
print(response)

πŸ“Š Sample Dashboard Template (LangSmith or LangFuse)

A good dashboard might include:

Success Rate Panel

  • Total requests
  • Percentage of successful/approved answers

Latency Panel

  • Average response time
  • P95 response time

Feedback Panel

  • Most common πŸ‘ / πŸ‘Ž reasons
  • Histogram of user ratings

Top Queries Table

  • Query text
  • User score or outcome
  • Timestamp + tags

Drop-off Funnel

  • Steps from initial query to completion or re-ask

πŸ” End-to-End Feedback Loop

To close the loop from user interaction to model improvement:

  1. Capture Live Interactions using LangSmith or LangFuse.
  2. Tag & Rate Responses via:
    • Thumbs up/down buttons
    • 1–5 star rating
    • Free-text user comments
  3. Log & Store Data into:
    • LangSmith's project dashboard
    • LangFuse logs with metadata
  4. Aggregate Insights through analytics dashboards.
  5. Export High-Quality Pairs and curate a fine-tuning dataset.
  6. Retrain or Augment your model or RAG index.
  7. Monitor Improvements over time via version comparison.

Optional additions:

  • Slack or Discord bots to request feedback inline
  • Airtable or Notion databases for tagging, editing, and review
  • Auto-notify team when drops in accuracy occur

πŸ“ˆ Bonus: Analytics to Improve UX

  • Track latency, response quality, fallback usage.
  • Identify high dropout paths or frequent re-asks.
  • Use heatmaps or feedback tags to guide your iteration roadmap.

Summary

LangSmith and LangFuse allow you to go beyond black-box LLM usage. With traceable observability and feedback tagging, you can:

  • Identify what works
  • Create datasets from real usage
  • Improve LLM accuracy through continuous learning

Use observability as a data refinery to evolve smarter models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment