Skip to content

Instantly share code, notes, and snippets.

@surminus
Last active May 14, 2026 15:58
Show Gist options
  • Select an option

  • Save surminus/e67664ae5f270850eef428cee35a14aa to your computer and use it in GitHub Desktop.

Select an option

Save surminus/e67664ae5f270850eef428cee35a14aa to your computer and use it in GitHub Desktop.
Laura Martin — 12 Month Achievement Summary (May 2025 – May 2026)

Laura Martin — 12 Month Achievement Summary (May 2025 – May 2026)

1,260 commits across 20+ distinct work streams. For context, the entire infrastructure repository had roughly 1,800 total commits in this period. Laura authored approximately 70% of all infrastructure changes.


1. ConfigDBv2 — Complete Rewrite of Configuration Management

Built a replacement for the legacy DynamoDB-based configuration system from scratch, spanning Go packages, Ruby client, Terraform modules, Lambda functions, and CLI tooling.

Scope:

  • Created gopkg/configdbv2 package with full DynamoDB-backed configuration storage, rollback logic, changelog generation, metadata management, sensitive data masking, and test stubs
  • Implemented ruby/manager: implement ConfigDBv2 for the Manager service to read/write from the new system
  • Built ruby/ably-env: implement AblyEnv::ConfigDBv2 with CLI commands (config v2 get, validate, sync, clear-scope)
  • Created configdb-replicator Lambda function for cross-region replication
  • Added Cartography API views for ConfigDBv2 with changelog and audit endpoints
  • Built YACE metrics scraping for ConfigDBv2 DynamoDB tables and Lambda functions
  • Created Terraform module for ConfigDBv2 infrastructure
  • Rolled out progressively: sandbox → nonprod → prod (with rollbacks and re-enables along the way)
  • Added S3 failover for configuration reads in Manager
  • Removed legacy ConfigDB code (Go, Ruby, Manager, Terraform) after migration

Impact: Replaced a critical piece of Ably's infrastructure configuration system, affecting how every realtime cluster is configured and deployed.


2. ablyctl — CLI Migration to Feature Parity and Beyond

Migrated the entire ably-env Ruby CLI to ablyctl (Go), then extended it significantly beyond parity.

Migration work (INF-6858):

  • Created migration plan with Claude, tracked progress through multiple phases
  • Ported all major command groups: autoscaling, config, secrets, crisis, release, instance, terraform, rabbitmq, logs, admin, netmap, routing policies, clusters
  • Standardised command conventions with a style guide and CLAUDE.md
  • Extracted inline Run closures across every package for testability
  • Added shared flags package, toolkit helpers, confirmation prompts
  • Split large single-file packages into logical files

New capabilities beyond ably-env:

  • VictoriaMetrics querying (metrics command)
  • Loki log querying (logs command with LogQL)
  • Alertmanager querying (alerts command)
  • AWS EBS volume management
  • AWS subnet listing
  • CloudTrail event lookup
  • VM cardinality metrics
  • Scalr workspace management (terraform scalr)
  • WAF capacity checking
  • CloudWatch metrics querying
  • Lifecycle testing framework (SSM-based instance testing)
  • Auto-update functionality with S3 bucket distribution
  • Sentry error reporting and command telemetry
  • Shell completion for all commands (cluster, region, site)
  • Claude agent safety hook (permission gating for automated operations)
  • Interactive version selection for config rollback
  • Parallel instance connections and commands
  • --plain and --tail flags for log output
  • Config edit command

Impact: The primary infrastructure operations tool used by the entire infrastructure team, now in Go with better performance, testability, and extensibility.


3. Backup Infrastructure (INF-6399 / INF-7390)

End-to-end backup solution across all AWS regions for EBS, RDS, and Cassandra.

Scope:

  • Created backup-vault and backup-plan Terraform modules
  • Deployed backup vaults across all 13 nonprod regions and all prod regions
  • Deployed backup plans across all nonprod and prod regions
  • Upgraded backup modules to AWS Provider v6
  • Created Scalr workspaces for backup-nonprod and backup-prod
  • Bootstrapped backup AWS accounts (prod-backup, nonprod-backup)
  • Added SSO access for backup accounts
  • Created DR backup buckets
  • Enabled offsite backups for Cassandra, EBS (vmstorage), and RDS
  • Added KMS key management for backup plans
  • Configured cross-account backup vault access with unique SIDs per statement
  • Created ablyctl backup commands

Impact: Ably now has cross-region disaster recovery backups for all critical data stores, a capability that didn't exist before.


4. Crypt Backup

Built a complete automated secrets backup system.

Scope:

  • Created crypt-backup Lambda function (Go)
  • Implemented go/pkg/crypt/server serverside crypt package
  • Added regional and S3 failover for secret reads
  • Added Manager-level regional failover and S3 failover for secrets
  • Integrated Sentry for error tracking
  • Added YACE metrics scraping for the Lambda
  • Created alerting for Lambda function errors
  • Added cli: add automatic secrets backup command
  • Deployed in prod with monitoring
  • Fixed edge cases (deleting recreated secrets, YAML round-trip integer values)

Impact: Automated backup of all encrypted secrets with multi-region failover, protecting against data loss in the secrets management system.


5. tfgen — Terraform Generation Tool

Built a new tool to replace CDKTF for generating Terraform configuration.

Scope:

  • Created go/pkg/tfgen package from scratch
  • Added tfgen realtime package for realtime cluster Terraform generation
  • Integrated into infratool (go/tools/infratool: add tfgen command)
  • Integrated into ablyctl (go/tools/ablyctl: use tfgen for terraform realtime)
  • Integrated into ably-env (ruby/ably-env: use tfgen for terraform realtime)
  • Removed CDKTF for realtime stacks
  • Generated Terraform for all realtime clusters

Impact: Eliminated the CDKTF dependency (TypeScript/Node.js) for Terraform generation, replacing it with a native Go tool that integrates directly into the infrastructure toolchain.


6. Observability Infrastructure

Major overhaul of logging and monitoring systems.

Self-hosted Loki:

  • Deployed Loki across nonprod and prod
  • Created Loki containers with tuning (query timeout, gRPC message size, clustering)
  • Added memcached caching (scaled up multiple times)
  • Deployed Loki queriers for read scaling
  • Added external NLB for Loki access
  • Created basic-auth-proxy (NGINX) for authentication
  • Built Logs Cluster View Grafana dashboard
  • Switched default data source from Grafana Cloud to self-hosted Loki
  • Deleted Grafana Cloud Logs resources after migration

Vector migration (from Promtail):

  • Added Vector to AMI, deploy manifests, and manager configuration
  • Created Vector aggregator container and configuration
  • Added disk buffering, throttling, and pipeline configuration
  • Added Vector monitoring alerts
  • Deprecated and removed Promtail

VictoriaMetrics:

  • Upgraded vmstorage instance types (x2gd.large to r7g.xlarge)
  • Added vmselect instances
  • Enabled EBS snapshots and backups for vmstorage
  • Deprecated vmbackupmanager
  • Created victoriametrics-exporter with tests and mocking support
  • Added vm commands to ablyctl

Grafana dashboards:

  • Migrated multiple dashboards to Grafonnet (CloudFront, Data Center Instance View)
  • Created Cassandra Cluster View, Realtime Container View, Logs Cluster View
  • Updated Queue Cluster View with percentages and published message rates

Impact: Moved the entire logging pipeline from Grafana Cloud to self-hosted Loki (cost reduction), replaced Promtail with Vector (better performance/reliability), and improved monitoring dashboards across the board.


7. RabbitMQ — Quorum Queue Migration and Tooling

Migrated push queues to quorum queues and built comprehensive RabbitMQ management tooling.

Scope:

  • Created RabbitMQ cluster-config Terraform module
  • Migrated push queues to quorum (sandbox → nonprod → prod)
  • Created push-quorum vhost and associated resources
  • Upgraded RabbitMQ to 4.2
  • Migrated from rabbitmq.config to rabbitmq.conf
  • Removed custom auth plugin
  • Enabled shovel plugin
  • Built rabbitmq migrate-to-quorum ably-env command with skip-broken-queues resilience
  • Added delete-queue, delete-queues-matching, --filter to list-queues
  • Added list-users, --json output, queue type display
  • Fixed quorum migration for broken shovel status API and unacked queues
  • Fixed Reaper for RabbitMQ 4.x
  • Added RabbitMQ load testing enhancements to ably-env
  • Added ablybench rabbit receiver for benchmarking
  • Updated Grafana dashboards for queue monitoring
  • Rationalised queue module security groups and ports

Impact: Migrated to quorum queues for better data safety and fault tolerance, with comprehensive tooling for ongoing RabbitMQ operations.


8. Log Processor Performance Optimisation

Systematic performance tuning of the log processor.

Changes:

  • Hoisted regex compilation to package level
  • Flushed CSV once after processing instead of per line
  • Cached AllFields() result instead of recomputing per line
  • Pre-allocated and reused CSV row slice
  • Eliminated double-buffered compressed data
  • Pre-sized LogData maps to avoid growth during population
  • Right-sized compressed buffer to avoid over-allocation
  • Fixed missing break in S3 download retry loop
  • Removed unused FloatRegex and CSV_EOL

Impact: Reduced CPU and memory allocation in a Lambda function that processes every log line from every realtime instance.


9. Enterprise Monitoring

Built dynamic enterprise monitoring from Terraform modules.

Scope:

  • Created enterprise-monitoring Terraform module
  • Built adminapi Go package with tests (account monitoring, app monitoring resources)
  • Created adminapi Terraform provider resources
  • Added enterprise recording rules and alert expressions
  • Enabled monitoring per-account via Terraform
  • Added business hours paging configuration
  • Created enterprise monitoring dashboard
  • Migrated from legacy static monitoring to dynamic per-account monitoring
  • Added enterprise account reporting to ably-env (with --markdown, --csv, --missing flags)

Impact: Enterprise customer monitoring is now managed through Terraform rather than manual configuration, with automatic onboarding.


10. Cartography API

Extended the internal infrastructure API with multiple new capabilities.

Scope:

  • Added /v1/accounts endpoint with DynamoDB storage and tests
  • Added ConfigDBv2 views with changelog, audit, and /current endpoints
  • Added playbooks API endpoint with linting (INF-6974)
  • Added playbooks client and OpenAPI spec regeneration
  • Added cluster views (security, observability, deployment, queue, Cassandra)
  • Added config filtering, placement constraints view fixes
  • Added cluster name alias redirects
  • Migrated playbooks from Confluence to repository-based system

Impact: Cartography API is now the central infrastructure intelligence service, with configuration, playbooks, and account data all queryable.


11. Legacy Naming Migration (INF-7110)

Migrated from legacy "environment/data_center" naming to "cluster/site" across multiple systems.

Scope:

  • Migrated ably-env CLI commands to new naming
  • Updated alertmanager templates to handle cluster and site
  • Updated vmalert rules to use cluster and site in expressions
  • Migrated Grafana dashboards to new naming
  • Updated enterprise monitoring to use cluster label
  • Migrated Concourse pipeline naming
  • Added clusters naming command and debugging skills to ably-env

Impact: Standardised naming conventions across the infrastructure stack, eliminating confusion between legacy and modern terminology.


12. AWS Provider v6 Migration

Systematic upgrade of Terraform modules from AWS Provider v5 to v6.

Scope:

  • Upgraded backup-vault, backup-plan modules
  • Upgraded website modules (nonprod and prod)
  • Created Provider v6 upgrade guide and Copilot instructions
  • Piloted on crypt-backup and configdb nonprod
  • Enabled same-region RDS backups as part of the upgrade

Impact: Keeping Terraform modules current with the latest AWS provider, unblocking new AWS features.


13. Deployment Infrastructure

Rebuilt deployment infrastructure (Concourse CI).

Scope:

  • Created new deployment module (Terraform)
  • Created modern deployment cluster structure
  • Migrated Concourse to new nonprod and prod infrastructure
  • Created deployment DNS zones
  • Fixed Concourse worker for Ubuntu 24.04
  • Added autoscaling parameters
  • Upgraded deployment RDS

Impact: Modernised the CI/CD deployment infrastructure.


14. CI/CD Improvements

Significant improvements to the build and test pipeline.

Scope:

  • Refactored CI workflows (renamed "CI" to "Containers", separated manager/ably-env/on-call-dashboard)
  • Created dynamic matrix for per-container build jobs
  • Added Docker buildx caching to GHA backend
  • Fixed multi-arch manifest race condition
  • Created infratool Go CI tool to replace Rake
  • Added pull_request triggers for reliable PR path filtering
  • Added infratool image check and --json commands
  • Added Copilot setup steps for GitHub
  • Moved to standardrb for Ruby linting

Impact: Faster, more reliable CI pipelines with better container build caching and matrix parallelism.


15. Service Framework (go/server, equip, gamekeeper)

Created a framework for building and packaging Go services.

Scope:

  • Created go/server/ directory structure with packaging workflow
  • Built go/pkg/cli standard entrypoint for server CLI tools
  • Created equip CLI for service unit file generation
  • Added parrot example service
  • Created gktools assert package with tests (running containers, server testing)
  • Created host package
  • Added CI for uploading gktools
  • Restructured assert to use the cli package
  • Introduced d/ctl service pattern with port registry

Impact: Standardised framework for building, testing, and deploying infrastructure services.


16. Repository Structure

Major restructuring of the repository.

Scope:

  • Monorepo migration (restructured repository into monorepo)
  • Migrated Go modules (go.mod to go/go.mod)
  • Moved terraform providers to go/terraform/
  • Moved containerised services to go/services/
  • Moved ablybench, ablyctl, infratool to go/tools/
  • Migrated ablyaws from ablyctl to shared gopkg
  • Created shared Go packages (configdb, configdbv2, crypt, ablyaws, adminapi)
  • Created AWS interface generator for mocking
  • Architecture documentation in docs/ directory

Impact: Cleaner repository structure with shared packages enabling code reuse across tools and services.


17. Security

Scope:

  • Created WAF rule to block CloudFront host header requests
  • Made WAF rule action configurable (count → block progression)
  • Added pusher-pubnub regex pattern set
  • Scoped credentials per caller for ably-env and ablyctl (INF-6938)
  • Built Claude agent safety hook (permission gating)
  • Removed orbit rule group from prod WAF

Impact: Improved security posture for public-facing infrastructure and internal tooling.


18. On-Call Dashboard

Built an internal dashboard for on-call management.

Scope:

  • Created incident timeline and statistics views
  • Added calendar view
  • Built incident view and index page
  • Added in-memory caching
  • Improved styling and routing
  • Fixed incident paging

19. Claude/AI Tooling

Built infrastructure-specific AI tooling and skills.

Scope:

  • Created CLAUDE.md files across the repository (root, Go, Ruby, Terraform, ablyctl, modules)
  • Built skills: jira-ticket, backlog, query-alerts, query-metrics, query-logs, debug-live, test-manager, dev-cluster
  • Added agent permission checker hook for safe automated operations
  • Created automatic copilot feedback loop for jira-ticket skill
  • Shared step scaling investigation and AI workflow learnings with team

20. Customer and Operational Work

Throughout the year, ongoing operational work:

  • Customer onboarding: Duolingo, Gorgias CNAMEs, Hivebrite, Kraken, EA, Bloke Design, Reflag, Leya, Aristocrat, Lightspeed scaling
  • AMI upgrades and rollouts (multiple cycles)
  • Terraform module updates across all clusters (realtime, observability, cassandra, queue, security, deployment, api, website)
  • Incident response tooling (incident.io integration, alertmanager webhook config)
  • Realtime upgrades (7.27, 7.32, 7.54)
  • Docker upgrades and fixes
  • Manager bug fixes (ECR retry, Docker API changes, credentials, container age metrics, Reaper)
  • ably-env bug fixes (config, crisis, rollback, scaling, validation)

Summary

Area Scale
Total commits 1,260
Share of all repo commits ~70%
Major systems built from scratch ConfigDBv2, crypt-backup, tfgen, backup infrastructure, enterprise monitoring, service framework, on-call dashboard
Major migrations completed ablyctl (Ruby → Go), Loki (Grafana Cloud → self-hosted), Vector (Promtail replacement), quorum queues, Provider v6, legacy naming, monorepo restructure
Terraform modules created backup-vault, backup-plan, deployment, enterprise-monitoring, cloudflare-exporter, configdb, rabbit cluster-config
Go packages created configdbv2, configdb, crypt/server, tfgen, adminapi, cli, host, servertest, lifecycle, equip
CLI commands added 40+ new commands across ablyctl and ably-env
Grafana dashboards 6 created or migrated to Grafonnet

Promotion Case: Laura Martin

Current role: Site Reliability Engineer, Infrastructure Team Period under review: May 2025 – May 2026 Date: 14 May 2026


Summary

In the last 12 months I have authored 1,260 commits to the infrastructure repository, approximately 70% of all changes. I designed and delivered multiple company-critical systems from scratch, led the largest CLI migration in the company's history, and overhauled the entire observability stack. I did this while maintaining on-call responsibilities, reviewing my teammates' work, and keeping the platform running day-to-day.

The scope, complexity, and independence of this work consistently exceeds my current level. Below I map specific achievements to the Ably engineering progression framework.


Evidence Mapped to Engineering IC5 Criteria

The Eng-IC5 level description: "You are a technical leader who owns large parts of the system. You will be driving your team's technical strategy and decision making."


Knowledge

"You are sought after for knowledge and guidance of service level components."

I am the named reviewer on teammates' PRs across the infrastructure repository (visible in the GitHub notification patterns in #team-infrastructure). When PRs from Matt Hammond, Simon Woolf, Martin Schon, or Steven Lindsay need review, I am personally tagged alongside the team group. My review coverage spans Go, Ruby, Terraform, containers, CI, and monitoring configuration. There is no area of the infrastructure stack where I am not comfortable reviewing and providing guidance.

"You are not satisfied with the status quo; you initiate and deliver changes that positively impact the wider team."

  • Crypt backup: I identified the risk of secrets data loss and self-initiated the entire solution (Lambda function, regional failover, S3 failover, monitoring, alerting). This was not requested or assigned.
  • tfgen: I identified CDKTF as a maintenance burden (TypeScript/Node.js dependency in a Go/Ruby infrastructure team) and built a native Go replacement from scratch, then removed CDKTF entirely.
  • Architecture documentation: I created the docs/ directory and established the practice of storing architecture documentation alongside the code, rather than in Confluence where it drifts from reality.
  • Claude skills: I built 8+ infrastructure-specific skills (jira-ticket, backlog, query-alerts, query-metrics, query-logs, debug-live, test-manager, dev-cluster) that encode institutional knowledge into reusable tools, reducing reliance on any single person's memory.
  • Agent safety hook: I built the permission-gating system for automated Claude operations on infrastructure, enabling safe AI-assisted operations without compromising production safety.

"You are a technical leader and own large parts of the system or services."

I own, in the sense of having built or substantially rebuilt and being the primary maintainer of:

  • ablyctl — the primary infrastructure operations CLI
  • ConfigDBv2 — the configuration management system for all realtime clusters
  • Backup infrastructure — cross-region DR for all critical data stores
  • Crypt backup — automated secrets backup with multi-region failover
  • tfgen — Terraform generation replacing CDKTF
  • Enterprise monitoring — dynamic per-account monitoring via Terraform
  • Cartography API — the infrastructure intelligence service (accounts, config views, playbooks)
  • Log processor — the Lambda that processes all realtime log data
  • On-call dashboard — internal incident management tool

I am the second most prolific contributor to the infrastructure repository in its entire history (5,060 commits), behind only the CEO (5,141), who has been here since founding in 2016. I joined in October 2021.

"You have a holistic and in-depth understanding of organisational context and apply it to your decision making."

The breadth of my work in the last 12 months spans every cluster type (realtime, observability, deployment, security, queue, api, website), every environment (sandbox, nonprod, prod), and every major subsystem (configuration, secrets, backups, logging, metrics, alerting, RabbitMQ, Cassandra, DNS). I have shipped production work in Go, Ruby, Terraform/OpenTofu, Grafana/Grafonnet, Docker, Bash, and YAML configuration. I have written Lambda functions, CLI tools, Terraform providers, Terraform modules, API endpoints, systemd service frameworks, container definitions, CI pipelines, and Grafana dashboards.

When making decisions (e.g. tfgen architecture, ConfigDBv2 rollout strategy, backup account structure), I consider the operational impact on the team, the deployment safety for production, and the long-term maintainability of the solution.


Ownership & Delivery

"You lead core, wide-impacting projects and excel in their delivery."

Project Scope Outcome
ablyctl migration (INF-6858) Ported the entire ably-env Ruby CLI to Go, added 40+ new commands beyond parity Now the primary infrastructure operations tool for the team
ConfigDBv2 Built replacement for the core configuration system (Go package, Ruby client, Terraform module, Lambda replicator, CLI commands, Cartography views) Rolled out to all environments, legacy code removed
Backup infrastructure (INF-6399/7390) Designed and deployed cross-region DR backups across all 13 regions Ably now has disaster recovery for Cassandra, EBS, and RDS (previously non-existent)
Observability overhaul Migrated logging from Grafana Cloud to self-hosted Loki, replaced Promtail with Vector, upgraded VictoriaMetrics, created/migrated 6 Grafana dashboards Cost reduction, improved reliability, better dashboards
RabbitMQ quorum migration Created cluster-config module, migrated push queues sandbox → nonprod → prod, upgraded to 4.2, built comprehensive tooling Better data safety and fault tolerance

Every one of these projects was driven to production. None are prototypes or experiments.

"You set the technical direction for the service(s) you own."

  • ablyctl: I defined the style guide, command conventions, package structure, and CLAUDE.md that govern how the CLI is developed. I made the architectural decisions (shared flags package, toolkit helpers, interface-based AWS mocking, Sentry telemetry).
  • Repository structure: I led the monorepo migration, established the shared Go packages pattern (configdb, configdbv2, crypt, ablyaws, adminapi), and created the AWS interface generator for testable code.
  • CI/CD: I refactored the CI pipelines (dynamic matrix builds, Docker caching, workflow separation), created infratool as a Go replacement for Rake, and established the container build patterns.

"You have transitioned from a maker to a force multiplier."

While I still write a lot of code, a significant proportion of my work multiplies the team's output:

  • Claude skills encode my knowledge into tools anyone can use (query-alerts, query-metrics, debug-live)
  • CLAUDE.md files across the repository mean AI agents and new team members can work effectively without asking me
  • Playbooks migration moved operational knowledge from Confluence to a structured, version-controlled, machine-readable format
  • Architecture documentation in docs/ gives the team context that was previously only in people's heads
  • ablyctl gives every team member capabilities (metrics querying, log querying, alert querying, lifecycle testing) that previously required manual AWS console access or separate tooling

"You are proactive in identifying broad problems and help to solve them."

  • Identified the absence of DR backups and built the entire solution
  • Identified CDKTF as a maintenance burden and built the replacement
  • Identified secrets backup risk and built crypt-backup
  • Identified the legacy naming inconsistency and drove the migration
  • Identified the need for agent safety controls and built the permission hook

"You establish and communicate team wide standards and practices."

  • Created CLAUDE.md files that define coding conventions, commit practices, and PR standards used daily by the team
  • Defined the ablyctl style guide and command conventions
  • Established the shared Go packages pattern for code reuse
  • Created the Terraform module change guidelines
  • Contributed thoughtfully to team discussions on AI-assisted development, PR review workflows, and commit health (visible in #team-infrastructure)

Communication & Leadership

"You frame technical discussions and lead the team to a consensus."

My Slack post in #team-infrastructure about PR review workflows and commit health was one of the most substantive team discussions in recent months. I articulated a specific concern ("I'm really mindful that we don't lose important context in our git history to be replaced with the same generic slop comments"), proposed a concrete workflow, shared my own tooling (Claude skills for git), and engaged with Paddy's feedback constructively.

When building ConfigDBv2, I made progressive rollout decisions (sandbox → nonprod → prod), rolled back when issues appeared, re-enabled with fixes, and communicated the status throughout. The system is now live in production because the rollout was managed carefully, not rushed.

"You help cultivate a healthy engineering culture and have a positive influence on those around you."

  • I share my experiments and learnings with the team (step scaling investigation, AI workflow patterns, Claude skills)
  • I review teammates' PRs consistently and am a named reviewer on their work
  • I take on-call shifts and handle them responsibly
  • I help across team boundaries (DevEx support, IT support, contributing to wider discussions)
  • I maintain a direct, honest communication style (per my user manual: "very happy with asynchronous chats, happy to jump on a call to quickly discuss stuff")

Evidence Against IC5 Impact Criteria

The company-wide framework added Impact (Complexity, Scope, Ownership) in late 2025.

Complexity: ConfigDBv2, tfgen, backup infrastructure, and crypt-backup are not incremental features. They are foundational infrastructure components that other systems depend on. ConfigDBv2 affects how every realtime cluster is configured. The backup infrastructure protects all critical data stores. tfgen replaced an entire build toolchain. The log processor optimisation required understanding Go memory allocation patterns at a level of detail that few engineers at the company work at.

Scope: My work spans the full infrastructure stack: AWS accounts, Terraform modules, Go services, Ruby tooling, container definitions, CI pipelines, monitoring configuration, Grafana dashboards, and operational playbooks. In the last 12 months I have touched every cluster type, every environment, and every major subsystem.

Ownership: 70% of all infrastructure repository commits in the last 12 months. Primary author or sole author of 7 systems built from scratch. Led every major infrastructure migration. On-call. PR reviewer. Production incident responder. Customer onboarding. The infrastructure team's output is, to a measurable degree, my output.


What This Means

Ably's progression framework states that promotion requires performing consistently, continuously, and comfortably at approximately 80% of the next level's expectations.

  • Consistently: This is not a single good quarter. 1,260 commits across 12 months, spanning 20+ work streams.
  • Continuously: Every project was driven to production. ConfigDBv2 is live. ablyctl is the primary operations tool. Backup infrastructure protects all regions. Loki is the default log source. These are production systems the company depends on.
  • Comfortably: The volume and breadth of work (70% of all changes, spanning Go/Ruby/Terraform/containers/CI/monitoring) demonstrates that this scope is not a stretch. This is my normal operating tempo.

For reference, here is how I map against every Eng-IC5 criterion:

Eng-IC5 Criterion Status Key Evidence
Sought after for knowledge and guidance Doing regularly Named PR reviewer across all areas, no stack gaps
Not satisfied with status quo; initiates positive changes Doing regularly Self-initiated crypt-backup, tfgen, architecture docs, Claude skills, agent safety hook
Technical leader who owns large parts of the system Doing regularly Primary owner of ablyctl, ConfigDBv2, backup infra, crypt-backup, tfgen, enterprise monitoring, cartography, log processor, on-call dashboard
Holistic understanding of organisational context Doing regularly Work spans every cluster type, environment, subsystem, and language in the infrastructure stack
Leads core, wide-impacting projects Doing regularly ablyctl, ConfigDBv2, backup infra, observability overhaul, RabbitMQ quorum migration
Sets technical direction for owned services Doing regularly ablyctl conventions, repo structure, CI patterns, shared packages, module guidelines
Transitioned from maker to force multiplier Doing regularly Claude skills, CLAUDE.md files, playbooks migration, architecture docs, ablyctl capabilities
Proactive in identifying broad problems Doing regularly DR backups, CDKTF replacement, secrets backup, legacy naming, agent safety
Establishes and communicates team-wide standards Doing regularly CLAUDE.md files, ablyctl style guide, Go package patterns, Terraform module guidelines
Frames technical discussions and leads to consensus Doing regularly PR workflow discussion, ConfigDBv2 rollout management
Cultivates healthy engineering culture Doing regularly Shares learnings, reviews PRs, on-call, cross-team help

Contributions Summary

Area Scale
Total commits (12 months) 1,260
Share of all repo commits ~70%
All-time commits (since Oct 2021) 5,060 (2nd in repo history)
Systems built from scratch ConfigDBv2, crypt-backup, tfgen, backup infrastructure, enterprise monitoring, service framework, on-call dashboard
Migrations led ablyctl (Ruby to Go), Loki (Grafana Cloud to self-hosted), Vector (Promtail replacement), quorum queues, Provider v6, legacy naming, monorepo restructure
Terraform modules created backup-vault, backup-plan, deployment, enterprise-monitoring, cloudflare-exporter, configdb, rabbit cluster-config
Go packages created configdbv2, configdb, crypt/server, tfgen, adminapi, cli, host, servertest, lifecycle, equip
CLI commands added 40+ new commands across ablyctl and ably-env
Grafana dashboards 6 created or migrated to Grafonnet
Claude skills built 8+ infrastructure-specific skills

Closing

I have been at Ably since October 2021. In that time I have become the second most prolific contributor to the infrastructure repository in its entire history. In the last 12 months specifically, I have authored 70% of all infrastructure changes, built multiple company-critical systems from scratch, led every major infrastructure migration, and maintained the operational health of the platform.

I am not asking for recognition of potential. I am asking for recognition that reflects the level at which I am already operating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment