Tagging Langfuse Conversations for User + Site Conversions

This guide outlines how to tag Langfuse sessions or traces for a user and site when a conversion (e.g., moving from free to paid) occurs, allowing filtering of converted vs. non-converted conversations. Since Langfuse's API doesn't natively support key-based metadata filtering (e.g., site_id), we combine native filters with client-side processing.

Approach

Fetch sessions for a user in a time window (e.g., last 24 hours) using user_id and start_time_after.
Fetch traces for each session using session_id.
Filter traces client-side by metadata.site_id.
Apply a score (e.g., 24h_before_conversion=1) to matching traces or sessions for easy filtering in Langfuse.

Example Implementation (Python SDK)

Below is a Python script to tag traces when a conversion occurs for a user + site combo.

from langfuse import Langfuse
from datetime import datetime, timedelta
import os

langfuse = Langfuse(
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    host=os.getenv("LANGFUSE_HOST", "https://cloud.langfuse.com")
)

def tag_pre_conversion(user_id: str, site_id: str, hours_back: int = 24):
    """
    Tag traces in the last `hours_back` hours for a user + site as converted.
    
    Args:
        user_id: User identifier
        site_id: Site identifier (from trace metadata)
        hours_back: Time window to look back (default: 24 hours)
    
    Returns:
        List of tagged traces
    """
    # Calculate time window
    cutoff_time = datetime.utcnow() - timedelta(hours=hours_back)
    cutoff_iso = cutoff_time.isoformat() + "Z"

    # Fetch sessions for user in window
    sessions = langfuse.list_sessions(
        user_id=user_id,
        start_time_after=cutoff_iso,
        limit=500  # Adjust based on volume; paginate if needed
    )

    matching_traces = []
    for session in sessions:
        # Fetch traces for this session
        traces = langfuse.list_traces(
            session_id=session.id,
            start_time_after=cutoff_iso,
            limit=1000  # Per session; adjust as needed
        )
        
        # Filter traces by metadata.site_id
        for trace in traces:
            if trace.metadata and trace.metadata.get("site_id") == site_id:
                matching_traces.append(trace)
    
    # Score matching traces
    for trace in matching_traces:
        langfuse.create_score(
            trace_id=trace.id,
            name="24h_before_conversion",
            value=1,
            data_type="NUMERIC",  # Or "BOOLEAN" for True/False
            comment=f"Tagged on conversion for user {user_id} site {site_id}"
        )
    
    print(f"Tagged {len(matching_traces)} traces for user {user_id} + site {site_id}")
    return matching_traces

# Example usage: Call on conversion event (e.g., webhook)
tag_pre_conversion(user_id="user123", site_id="site456")

Notes

Why this works: Uses native user_id and session_id filters to reduce API calls, with client-side site_id filtering (lightweight JSON parsing).
Performance:
- Paginate (offset param) for high volumes (>500 sessions or >1000 traces).
- Run in a background job (e.g., Celery) triggered by conversion webhooks.
Edge cases:
- If sessions aren't used, query traces directly with list_traces(user_id=..., start_time_after=...) and filter by user_id + metadata.site_id.
- Ensure site_id is consistently in trace metadata.
Filtering in Langfuse: After tagging, use the Langfuse UI or API to filter traces with 24h_before_conversion=1 for converted conversations.

Alternatives

Session-Level Scoring: Apply scores to sessions instead of traces for holistic chat thread evaluation.

langfuse.create_score(
    session_id=session.id,
    name="24h_before_conversion",
    value=1,
    data_type="NUMERIC"
)

Self-Hosted DB Queries: If self-hosting Langfuse, query Postgres directly (e.g., SELECT * FROM trace WHERE metadata->>'site_id' = '123' AND user_id = 'user123').
Export for Analytics: Export traces to S3/CSV and query with Pandas/SQL for complex analysis.
Tags for Filtering: Store site_id as a trace tag (e.g., site:123) for native filtering.

References

Langfuse Python SDK: https://langfuse.com/docs/sdk/python
API Docs: https://api.reference.langfuse.com
Self-Hosting Schema: https://langfuse.com/docs/deployment/self-host