Skip to content

Instantly share code, notes, and snippets.

@fpaupier
Created March 26, 2025 14:19
Show Gist options
  • Save fpaupier/9f8e6845aefaa58cfcc9f669f6f954d4 to your computer and use it in GitHub Desktop.
Save fpaupier/9f8e6845aefaa58cfcc9f669f6f954d4 to your computer and use it in GitHub Desktop.
load test schema
+-----------------------+ HTTP Requests +---------------------------------+ Trace Data +-------------------+
| Client de Load Test | -------------------> | vLLM OpenAI API Endpoint | -------------------> | Langfuse Cloud |
| (k6, Locust, script) | (Load Profile) | (Tournant sur H100 ou L40) | (SDK/API Call) | (UI & API) |
| | | (Modèle: Qwen / Llama3) | | |
| - Génère QPS | <------------------- | | <------------------- | - Visualisation |
| - Mesure Latence (Client) | Responses | - Traite les requêtes | User Query | - Analyse |
| - Mesure Taux Succès | | - Mesure Latence (Server) | | - Export |
+-----------------------+ | - Mesure Tokens In/Out | +-------------------+
+---------------------------------+
| ^
| | (Optionnel)
nvidia-smi query | GPU Metrics (Util, Mem)
V |
+-------+
| GPU |
+-------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment