Deployed 25 backend services to staging to test metrics changes. Found and fixed 5 bugs introduced by library version mismatches. All services are now running healthy.
| Library | Version | Fix |
|---|---|---|
| gisual-task-runner | 0.6.25 | component_timer.labels() — pass all 4 labels (category, service, component, operation) |
| gisual-http-service | 1.13.21 | ShutdownCoordinator — inspect-based metrics kwarg safety + relaxed service-utils pin |
-
Task runner crash-loop —
ValueError: Incorrect label countoncomponent_timerstartup metric.gisual-prometheus-clientsredefinedcomponent_timerwith 4 labels butgisual-task-runneronly passed 2. Fixed in task-runner 0.6.25. -
intel-api 500s —
TypeError: StateManager.get_lock() got an unexpected keyword argument 'op_name'. Thegisual-redis-clientusesoperation=notop_name=. Fixed 3 call sites in intel-api. -
outage-validation-task-manager 500s — Same
op_name→operationmismatch. Fixed 15 call sites. -
data-collection-api crash-loop —
TypeError: ShutdownCoordinator.__init__() got an unexpected keyword argument 'metrics'.gisual-http-service1.13.19 unconditionally passesmetricsbut the installedgisual-service-utilsdidn't accept it. Fixed in http-service 1.13.21 with inspect-based detection. -
public-api / asset-outage-notifier Docker build failures — Local path dependencies in
[tool.uv.sources]breaking Docker builds. Removed path deps, pinned from PyPI.
All images on docker.gisual.net.
| Service | Image Tag | Fixes Applied |
|---|---|---|
| data-collection-api | :619ada55 |
http-service 1.13.21 (was crash-looping) |
| public-api | :c3627d6f |
http-service 1.13.21, removed path dep |
| intel-api | :91e5f764 |
http-service 1.13.21, op_name→operation (3 sites) |
| incidents-api | :f0d73215 |
http-service 1.13.21 |
| decisions-api | :8ec21d9a |
http-service 1.13.21 |
| locations-api | :adefb2c4 |
http-service 1.13.21 |
| assets-api | :04efddf9 |
http-service 1.13.21 |
| outage-scans-api | :4dcf5fe5 |
http-service 1.13.21 |
| outage-validation-api | :9cd8222f |
http-service 1.13.21 |
| predictions-api | :58129b98 |
http-service 1.13.21 |
| satellite-api | :0f855172 |
http-service 1.13.21 |
| usage-api | :b58e7ec3 |
http-service 1.13.21 |
| web-tool-api | :159a2462 |
http-service 1.13.21 |
| dependencies-api | :f9af3051 |
http-service 1.13.21 |
| Service | Image Tag | Fixes Applied |
|---|---|---|
| search-retry-feeder | :59b1be6c |
task-runner 0.6.25 |
| outage-validation-task-manager | :90cf62d3 |
task-runner 0.6.25, op_name→operation (15 sites) |
| regional-outage-feeder | :31a5be02 |
task-runner 0.6.25 |
| customer-utility-updater | :52a8db85 |
task-runner 0.6.25 |
| current-incidents-cache-pruner | :87de424f |
task-runner 0.6.25 |
| archive-tagger | :f0eca8d4 |
task-runner 0.6.25 |
| asset-outage-notifier | :ded6a1c7 |
task-runner 0.6.25, removed gisual-runtime path dep |
notification-sender, outage-updater, asset-status-updater, alarm-lifecycle-manager — running healthy on existing images.
| Service | Pipeline | MR |
|---|---|---|
| data-collection-api | ✅ passed | !19 |
| decisions-api | ✅ passed | !114 |
| dependencies-api | ✅ passed | !6 |
| incidents-api | ✅ passed | !56 |
| locations-api | ✅ passed | !24 |
| public-api | ✅ passed | !69 |
| satellite-api | ✅ passed | !10 |
| alarm-lifecycle-manager | ✅ passed | !34 |
| search-retry-feeder | ✅ passed | !18 |
| current-incidents-cache-pruner | ✅ passed | !8 |
| asset-outage-notifier | ✅ passed | !6 |
| outage-validation-task-manager | ✅ passed | !23 |
| customer-utility-updater | ✅ passed | !6 |
| regional-outage-feeder | ✅ passed | !12 |
| Service | Pipeline |
|---|---|
| assets-api | ✅ passed |
| intel-api | ✅ passed |
| outage-scans-api | ✅ passed |
| outage-validation-api | ✅ passed |
| predictions-api | ✅ passed |
| usage-api | ✅ passed |
| web-tool-api | ✅ passed |
| notification-sender | ✅ passed |
| outage-updater | ✅ passed |
| asset-status-updater | ✅ passed |
| archive-tagger | ❌ failed (mypy type errors — pre-existing, unrelated to deploy) |
- dependencies-api needs redeployment to staging (running old image, healthy but on stale code)
- archive-tagger has pre-existing mypy failures (23 type errors in
appdb.py/tasks.py) from stricter type stubs in updated dependencies - public-api shutdown errors (
shutdown_opentelemetry,shutdown_logging) are non-fatal but noisy — methods not implemented in base class