Skip to content

Instantly share code, notes, and snippets.

View nerdalert's full-sized avatar
🐈
🦀 🐿

Brent Salisbury nerdalert

🐈
🦀 🐿
View GitHub Profile
$ kubectl get envoyfilter --all-namespaces -o yaml
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1alpha3
  kind: EnvoyFilter
  metadata:
    creationTimestamp: "2025-07-10T05:32:40Z"
    generation: 1
    labels:

vLLM Inference Simulator

Repo at llm-d/llm-d-inference-sim

1 . Start the sim container

podman || docker run --rm --net host ghcr.io/llm-d/llm-d-inference-sim \
  --port 8000 \
 --model "Qwen/Qwen2.5-1.5B-Instruct" \
2025-06-24T04:04:35.7285268Z Current runner version: '2.325.0'
2025-06-24T04:04:35.7315212Z ##[group]Operating System
2025-06-24T04:04:35.7316005Z Ubuntu
2025-06-24T04:04:35.7316512Z 24.04.2
2025-06-24T04:04:35.7316951Z LTS
2025-06-24T04:04:35.7317533Z ##[endgroup]
2025-06-24T04:04:35.7318061Z ##[group]Runner Image
2025-06-24T04:04:35.7318693Z Image: ubuntu-24.04
2025-06-24T04:04:35.7319275Z Version: 20250615.1.0
2025-06-24T04:04:35.7320471Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md
2025-06-24T03:52:30.5394764Z Current runner version: '2.325.0'
2025-06-24T03:52:30.5428738Z ##[group]Operating System
2025-06-24T03:52:30.5429883Z Ubuntu
2025-06-24T03:52:30.5430846Z 24.04.2
2025-06-24T03:52:30.5431596Z LTS
2025-06-24T03:52:30.5432469Z ##[endgroup]
2025-06-24T03:52:30.5433460Z ##[group]Runner Image
2025-06-24T03:52:30.5434453Z Image: ubuntu-24.04
2025-06-24T03:52:30.5435403Z Version: 20250615.1.0
2025-06-24T03:52:30.5437208Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md
2025-06-24T02:53:43.4754220Z Current runner version: '2.325.0'
2025-06-24T02:53:43.4782172Z ##[group]Operating System
2025-06-24T02:53:43.4783684Z Ubuntu
2025-06-24T02:53:43.4784437Z 24.04.2
2025-06-24T02:53:43.4785176Z LTS
2025-06-24T02:53:43.4785895Z ##[endgroup]
2025-06-24T02:53:43.4786846Z ##[group]Runner Image
2025-06-24T02:53:43.4787746Z Image: ubuntu-24.04
2025-06-24T02:53:43.4788600Z Version: 20250615.1.0
2025-06-24T02:53:43.4790596Z Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20250615.1/images/ubuntu/Ubuntu2404-Readme.md

Request rates 10,30,inf (num prompts max 900)

Only difference in the commands are metadata (deployment name for graphing):

./run-bench.sh   --model meta-llama/Llama-3.2-3B-Instruct \
  --base_url http://llm-d-inference-gateway.llm-d.svc.cluster.local:80 \
  --dataset-name random \
  --input-len 1000 \
 --output-len 500 \
$ ENV_METADATA_GPU="4xNVIDIA_L40S" \
./e2e-bench-control.sh --4xgpu-minikube --model meta-llama/Llama-3.2-3B-Instruct

🌟 LLM Deployment and Benchmark Orchestrator 🌟
-------------------------------------------------
--- Configuration Summary ---
Minikube Start Args (Hardcoded): --driver docker --container-runtime docker --gpus all --memory no-limit --cpus no-limit
LLMD Installer Script (Hardcoded): ./llmd-installer.sh
Test Request Script (Hardcoded): ./test-request.sh (Args: --minikube, Retry: 30s)
ubuntu@ip-172-31-16-33:~/secret-llm-d-deployer/project$ kubectl logs -n kgateway-system kgateway-7c58ddd989-nw5wc -c kgateway --previous --tail=200
{"level":"info","ts":"2025-05-17T18:01:08.979Z","caller":"probes/probes.go:57","msg":"probe server starting at :8765 listening for /healthz"}
{"level":"info","ts":"2025-05-17T18:01:08.979Z","caller":"setup/setup.go:69","msg":"got settings from env: {DnsLookupFamily:V4_PREFERRED EnableIstioIntegration:false EnableIstioAutoMtls:false IstioNamespace:istio-system XdsServiceName:kgateway XdsServicePort:9977 UseRustFormations:false EnableInferExt:true InferExtAutoProvision:false DefaultImageRegistry:cr.kgateway.dev/kgateway-dev DefaultImageTag:v2.0.0 DefaultImagePullPolicy:IfNotPresent}"}
{"level":"info","ts":"2025-05-17T18:01:08.980Z","logger":"k8s","caller":"setup/setup.go:110","msg":"starting kgateway"}
{"level":"info","ts":"2025-05-17T18:01:08.984Z","logger":"k8s","caller":"setup/setup.go:117","msg":"creating krt collections"}
{"level":"info","ts":"2025-05-17T18:01
#!/usr/bin/env bash
# -*- indent-tabs-mode: nil; tab-width: 4; sh-indentation: 4; -*-
set -euo pipefail
### GLOBALS ###
NAMESPACE="llm-d"
PROVISION_MINIKUBE=false
PROVISION_MINIKUBE_GPU=false
STORAGE_SIZE="15Gi"
#!/usr/bin/env python3
"""
transcribe_video_to_srt.py
Transcribe a video or audio file into SRT subtitles using OpenAI Whisper.
Dependencies & Install:
------------------------------------
# 1. Create & activate a virtual environment (optional but recommended):
# python3 -m venv venv