The prometheus_client Python library uses threading.Lock per label combination on every .labels().observe() call. This caused OOM crashes (700+ MiB → OOMKill) on aaa-api in staging. The fix replaces the prometheus_client backend with OpenTelemetry via a PROMETHEUS_BACKEND=otel env var toggle in gisual-prometheus-clients.
Validated in staging since 2026-03-22: aaa-api running at 162 MiB (under 368 MiB limit), zero restarts, zero 500 errors, all metrics present in /metrics output.