This guide outlines how to tag Langfuse sessions or traces for a user and site when a conversion (e.g., moving from free to paid) occurs, allowing filtering of converted vs. non-converted conversations. Since Langfuse's API doesn't natively support key-based metadata filtering (e.g., site_id), we combine native filters with client-side processing.
- Fetch sessions for a user in a time window (e.g., last 24 hours) using
user_idandstart_time_after. - Fetch traces for each session using
session_id. - Filter traces client-side by
metadata.site_id. - Apply a score (e.g.,
24h_before_conversion=1) to matching traces or sessions for easy filtering in Langfuse.
Below is a Python script to tag traces when a conversion occurs for a user + site combo.
from langfuse import Langfuse
from datetime import datetime, timedelta
import os
langfuse = Langfuse(
public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
host=os.getenv("LANGFUSE_HOST", "https://cloud.langfuse.com")
)
def tag_pre_conversion(user_id: str, site_id: str, hours_back: int = 24):
"""
Tag traces in the last `hours_back` hours for a user + site as converted.
Args:
user_id: User identifier
site_id: Site identifier (from trace metadata)
hours_back: Time window to look back (default: 24 hours)
Returns:
List of tagged traces
"""
# Calculate time window
cutoff_time = datetime.utcnow() - timedelta(hours=hours_back)
cutoff_iso = cutoff_time.isoformat() + "Z"
# Fetch sessions for user in window
sessions = langfuse.list_sessions(
user_id=user_id,
start_time_after=cutoff_iso,
limit=500 # Adjust based on volume; paginate if needed
)
matching_traces = []
for session in sessions:
# Fetch traces for this session
traces = langfuse.list_traces(
session_id=session.id,
start_time_after=cutoff_iso,
limit=1000 # Per session; adjust as needed
)
# Filter traces by metadata.site_id
for trace in traces:
if trace.metadata and trace.metadata.get("site_id") == site_id:
matching_traces.append(trace)
# Score matching traces
for trace in matching_traces:
langfuse.create_score(
trace_id=trace.id,
name="24h_before_conversion",
value=1,
data_type="NUMERIC", # Or "BOOLEAN" for True/False
comment=f"Tagged on conversion for user {user_id} site {site_id}"
)
print(f"Tagged {len(matching_traces)} traces for user {user_id} + site {site_id}")
return matching_traces
# Example usage: Call on conversion event (e.g., webhook)
tag_pre_conversion(user_id="user123", site_id="site456")- Why this works: Uses native
user_idandsession_idfilters to reduce API calls, with client-sidesite_idfiltering (lightweight JSON parsing). - Performance:
- Paginate (
offsetparam) for high volumes (>500 sessions or >1000 traces). - Run in a background job (e.g., Celery) triggered by conversion webhooks.
- Paginate (
- Edge cases:
- If sessions aren't used, query traces directly with
list_traces(user_id=..., start_time_after=...)and filter byuser_id+metadata.site_id. - Ensure
site_idis consistently in trace metadata.
- If sessions aren't used, query traces directly with
- Filtering in Langfuse: After tagging, use the Langfuse UI or API to filter traces with
24h_before_conversion=1for converted conversations.
- Session-Level Scoring: Apply scores to sessions instead of traces for holistic chat thread evaluation.
langfuse.create_score( session_id=session.id, name="24h_before_conversion", value=1, data_type="NUMERIC" )
- Self-Hosted DB Queries: If self-hosting Langfuse, query Postgres directly (e.g.,
SELECT * FROM trace WHERE metadata->>'site_id' = '123' AND user_id = 'user123'). - Export for Analytics: Export traces to S3/CSV and query with Pandas/SQL for complex analysis.
- Tags for Filtering: Store
site_idas a trace tag (e.g.,site:123) for native filtering.
- Langfuse Python SDK: https://langfuse.com/docs/sdk/python
- API Docs: https://api.reference.langfuse.com
- Self-Hosting Schema: https://langfuse.com/docs/deployment/self-host