A practical guide to infrastructure, deployments, and operations for engineers joining the team or looking to understand the full picture.
- Repository Map
- How Deployments Work
- How Terraform is Managed
- How Kubernetes is Managed
- How Database Migrations Work
- Environment Configuration
- Secrets Management
- Monitoring and Observability
| Repository | Purpose |
|---|---|
| devops_aws_terraform | Terraform for all AWS resources — EKS clusters, IAM, S3, ECR, Glue, Redshift, AppConfig, and more. 22 modules covering the full AWS footprint. |
| devops_github_terraform | Terraform for GitHub organization management — repository settings, branch protection rules, team membership, secrets, and environment configuration. |
| devops_terraform_initiators | GitHub Actions workflows that let you trigger Terraform plan/apply operations from the Actions UI without needing local AWS credentials. |
| Repository | Purpose |
|---|---|
| devops_pipelines | The engine. All reusable GitHub Actions workflows, build scripts, deployment scripts, Helm templates, Docker configurations, and service config files. This is where CI/CD logic lives. |
| devops_pipeline_initiators_eng | The steering wheel for engineering. User-friendly GitHub Actions workflows with dropdown menus for deploying services and web apps to lower environments. |
| devops_pipeline_initiators_prod | Same concept, but for production deployments with additional approval gates. |
| devops_pipeline_initiators_devops | DevOps-specific deployment workflows for infrastructure tools and maintenance operations. |
| devops_qa_pipelines | QA-specific CI/CD pipelines for test automation and validation workflows. |
| Repository | Purpose |
|---|---|
| env_files_all | Environment-specific configuration files for every service across every environment. The traditional Helm deployment path reads values from here. |
| appconfig_toolkit | Python CLI for managing AWS AppConfig — syncing application settings, detecting drift, rendering templates. Covers 47 services across 4 tech stacks. |
| g360_argocd | ArgoCD bootstrap configuration — cluster setup, platform apps (cert-manager, Karpenter, ALB controller, GitHub ARC runners), RBAC policies. |
| g360_env_configs | GitOps config repository. ArgoCD watches this repo; CI/CD pipelines update image tags and metadata here, triggering automatic syncs. |
| g360_helm_charts | Shared Helm library charts and service chart definitions used across deployments. |
| Repository | Purpose |
|---|---|
| devops_helper_scripts | Utility scripts — workflow usage analysis, GitHub org backups, Kafka schema management, build validation tools. |
| devops_sysadmin_scripts | System administration tools and operational runbooks. |
Deployments are triggered through initiator repositories — GitHub Actions workflows with guided dropdown menus that abstract away the underlying complexity. Engineers don't need to know Helm, Terraform, or Kubernetes to deploy.
For engineering teams: Go to devops_pipeline_initiators_eng, click Actions, and select the appropriate workflow.
| Workflow | What it Deploys | Services |
|---|---|---|
ENG | SVC |
Backend microservices | 26 services (approval, booking, payment, identity, search, etc.) |
ENG | WEBAPP | MP |
Marketplace web apps | 13+ apps (dashboard, search, sourcing, inventory, etc.) |
ENG | WEBAPP | HILTON |
Hilton private-label web apps | Brand-specific frontend builds |
ENG | WEBAPP | IHG |
IHG private-label web apps | Brand-specific frontend builds |
ENG | WEBAPP | WYNDHAM |
Wyndham private-label web apps | Brand-specific frontend builds |
ENG | INFRA Tools |
Infrastructure utilities | eks_deploy_info |
ENG | ETL |
ETL services | leonardo_service |
ENG | SVC Auto-Migration |
Services with forced DB migrations | Multi-language with validation |
When you trigger a deployment, you provide:
- Repository/service — dropdown selection (no free-text, prevents typos)
- DB modifications — optional; whether to run database migrations
- Branch or SHA — toggle between deploying a branch tip or a specific commit
- Branch name / SHA value — the actual ref to deploy
- Environment — target environment (dev52, qa63, uat71, etc.)
The initiator validates your inputs (SHA format, branch existence via GitHub API), then calls a reusable workflow in devops_pipelines.
Once triggered, the pipeline runs through these phases:
Initiator Workflow (devops_pipeline_initiators_eng)
│
▼
Base Workflow (devops_pipelines/cicd-base-svc-v1.yml)
│
├─► CI Phase: Build → Test → Lint → Containerize → Push to ECR
│
├─► DB Phase (optional): Run database migrations
│
└─► CD Phase: Deploy to target environment
│
├─► GitOps path (uat71): Update g360_env_configs → ArgoCD syncs
│
└─► Helm path (all others): Helm upgrade via EKS admin container
1. Traditional Helm Deployment (all environments except uat71)
The pipeline:
- Checks out env_files_all for environment-specific values
- Loads service configuration from
repo_artifacts/vars/cicd-per-repo/{service}.txt - Prepares Helm chart from templates
- Runs
helm upgradeinside a Docker container with EKS admin credentials - Key script:
deploy-code-svc-v1.sh
2. GitOps / ArgoCD Deployment (uat71, expanding to more environments)
The pipeline:
- Clones g360_env_configs
- Updates
configs/{service}/envs/uat71.yamlwith the new image tag and deployment metadata - Commits and pushes — GitHub webhook triggers ArgoCD sync automatically
- ArgoCD handles the actual Kubernetes rollout
- Key script:
deploy-code-gitops-v1.py - Health monitoring:
argocd-health-monitor.py
Every service has a config file at devops_pipelines/repo_artifacts/vars/cicd-per-repo/{service}.txt. This defines:
codename— Kubernetes deployment namecodetype— Technology stack (ruby, python, dotnet, node)codeversion_*— Runtime versions per environment tier- Container registry URLs (lower vs prod accounts)
- Dockerfile and Helm template references
- Health check paths, database names, AppConfig settings
The script get-env-vars-repo-svc-v1.sh parses these files and exports all values as environment variables for the pipeline.
For deploying multiple services at once (e.g., cutting a release branch to CUAT environments), use the release-branch-* workflows in devops_pipeline_initiators_eng:
release-branch-backend-marketplace.yml— Deploy all 36 backend services or a selected subsetrelease-branch-frontend-marketplace.yml— Deploy all frontend apps- Brand-specific variants for Hilton, IHG, Wyndham
These generate deployment matrices dynamically and skip services where the target branch doesn't exist.
All AWS infrastructure lives in devops_aws_terraform, organized by service:
| Module | What it Manages |
|---|---|
eks-cluster/aws-eks-cluster |
EKS clusters, node groups, OIDC federation, Helm addons, KMS encryption |
eks-cluster/aws-eks-networking |
VPC, subnets, Transit Gateway, VPC peering, security groups |
eks-cluster/aws-eks-spc |
Secret Provider Class for AWS Secrets Manager → K8s secrets integration |
aws-iam-service |
IAM users, groups, roles, Lambda-based credential distribution |
aws-container-registry |
ECR repositories with lifecycle policies and cross-account pull |
aws-s3-artifacts |
S3 buckets across environments |
aws-glue-etl |
Glue jobs, connections, catalogs, DynamoDB tables |
aws-glue-etl-jobs |
Modular Glue job definitions |
aws-appconfig |
AppConfig deployment strategies, IAM roles |
aws-ec2-gh-runners-artifacts |
Self-hosted GitHub Actions runners (always-on and on-demand) |
aws-redshift |
Redshift cluster and schema management |
aws-break-glass |
Emergency break-glass access and backup infrastructure |
GitHub organization management lives separately in devops_github_terraform:
| Module | What it Manages |
|---|---|
repos |
Repository settings, branch protection, collaborators, environments, secrets |
teams |
Team membership — Engineering, Marketplace, Core Services, QA, DevOps, Security, Architects |
All Terraform state is stored in S3 with DynamoDB locking:
Bucket: g360-tfstate-{environment}
Key: {project}/{environment}/{cluster}/terraform.tfstate
Lock: DynamoDB table g360-tfstate-{environment}
| Environment | AWS Account | Region | S3 Bucket |
|---|---|---|---|
| lower | 599778853101 | us-east-2 | g360-tfstate-lower |
| prod | 638757669574 | us-east-1 | g360-tfstate-prod |
| infra | (shared) | various | g360-tfstate-infra |
| breakglass | 663568... | us-west-2 | g360-tfstate-breakglass |
Cross-module references use terraform_remote_state data sources to read outputs from related modules (e.g., EKS cluster reads network state for VPC/subnet IDs).
Every module has a Makefile with standardized targets:
make init INFRA_ENV=lower CLUSTER_NAME=eks-cluster-02 # Initialize backend
make plan INFRA_ENV=lower ACCOUNT_ID=599778853101 # Generate plan
make apply INFRA_ENV=lower # Apply changes
make destroy INFRA_ENV=lower # Destroy resources
make format # Format HCL files
make validate # Syntax checkThe Makefile sets AWS_PROFILE={INFRA_ENV}-tf automatically and selects the correct .tfvars file.
devops_aws_terraform has 24 GitHub Actions workflows in .github/workflows/, one per module. Each follows the plan → approve → apply pattern:
- Plan job generates a
terraform.planartifact - Apply job downloads the artifact and applies (requires environment-based approval)
- Self-hosted runners:
g360-infrafor lower/infra,ci-cd-prodfor production
You can trigger these from the devops_terraform_initiators repo or directly from the workflow files.
Terraform version is pinned to 1.10.4 across all workflows.
| Cluster | Environment | Region | Purpose |
|---|---|---|---|
eks-cluster-02 |
lower | us-east-2 | All lower environments (dev, qa, uat, inter, etc.) |
g360-infra-cluster |
infra | us-east-1 | Shared infrastructure and tooling |
g360-prod-core-01 |
prod | us-east-1 | Production workloads |
All clusters use:
- Private-only API endpoints (no public access)
- Managed node groups with CloudWatch-based autoscaling
- Karpenter for dynamic node provisioning
- OIDC federation for IAM role assumption (no static credentials)
- KMS encryption for secrets at rest
- Core addons: CoreDNS, kube-proxy, VPC CNI, EBS CSI driver
The EKS infrastructure is provisioned through Terraform in this order:
aws-eks-networking— VPC, subnets, Transit Gatewayaws-eks-cluster— Cluster, node groups, IAM, Helm addonsaws-eks-spc— Secret Provider Class
Service deployments use standardized Helm templates stored in devops_pipelines/repo_artifacts/helm/v3.16.4/template/.
The primary template (g360_template_LIVEPROB_file) generates:
- Deployment — with Datadog APM integration, liveness probes via file check (
/k8spod_liveprobe/liveprobe.txt), resource limits, node selectors - Service — ClusterIP or LoadBalancer
- Ingress — Kubernetes ingress rules
- HPA — Horizontal Pod Autoscaler
- PDB — Pod Disruption Budget
- ServiceAccount — for OIDC/IAM role binding
- RBAC — Role and RoleBinding (per-service opt-in)
Environment-specific deployment YAMLs live in separate directories:
deployment_yamls_lower/— Lower environment overridesdeployment_yamls_prod/— Production overrides
ArgoCD is bootstrapped and managed via g360_argocd. It follows the App of Apps pattern:
- A root application manages ArgoCD itself (self-management)
- Platform apps are declared in
platform-apps/:- cert-manager
- AWS Load Balancer Controller
- Karpenter
- GitHub Actions Runner Controller (ARC)
Service deployments via ArgoCD (currently uat71 only):
- ArgoCD watches g360_env_configs for changes
- ApplicationSet definitions in
appsets/services/{service}.yaml - Per-environment values in
configs/{service}/envs/{env}.yaml - Automated sync with prune and self-heal enabled
Three ECR registries, one per AWS account:
| Account | Region | Use |
|---|---|---|
| 599778853101 | us-east-2 | Lower environment images (builds happen here) |
| 638757669574 | us-east-1 | Production images |
| 294892597080 | us-east-1 | Shared infrastructure images |
Docker images use a three-tier strategy:
- Base images — Foundation with runtime environments (
repo_artifacts/dockerfiles/new-base-img/) - Build images — CI/CD build tooling (
repo_artifacts/dockerfiles/cicd/) - Runtime images — Final application containers
Database migrations are integrated into the deployment pipeline and run as a separate phase before the application deployment.
Technology-specific migration scripts live in devops_pipelines/workflow_scripts/cicd/db/:
| Language | Auto-Migration Script | Manual Script |
|---|---|---|
| .NET | dotnet/dotnet-db-auto-migration-v1.sh |
dotnet/dotnet-db-v1.sh |
| Python | python/python-db-auto-migration-v1.sh |
python/python-db-v1.sh |
| Ruby | ruby/ruby-db-auto-migration-v1.sh |
ruby/ruby-db-legacy-v1.sh |
Option 1: During deployment — When triggering a service deployment via the marketplace, select the "db-mods" dropdown option (e.g., service-YES-db-mods). This runs migrations before the application deployment.
Option 2: Auto-migration workflow — The SVC_auto-migration.yml workflow in devops_pipeline_initiators_eng is dedicated to running migrations. It supports all four tech stacks and always enables DB modifications.
Option 3: Standalone migration workflows — The following workflows in devops_pipelines can be called directly:
cicd-db-v1.yml— Standard migrationscicd-db-auto-migration-v1.yml— Auto-migrations with AppConfig integrationcicd-db-legacy-v1.yml— Legacy migration support
- The pipeline detects the service's
codetype(dotnet/python/ruby) from the service config - Database credentials are fetched from AWS AppConfig via
fetch-appconfig.sh - The appropriate migration script runs inside a container with database connectivity
- For .NET: EF Core migrations; for Python: Alembic or custom scripts; for Ruby: ActiveRecord migrations
- Legacy vs non-legacy detection is automatic (the auto-migration script handles both)
The env_db_new_sync_all.yml workflow handles database synchronization across environments — useful for refreshing lower environments with production-like data.
Groups360 maintains 30+ environments across four tiers:
| Tier | Environments | Purpose |
|---|---|---|
| Dev | dev51, dev52, dev53, dev54, dev55 | Active development and feature testing |
| QA | qa61, qa62, qa63, qa65 | QA validation |
| UAT | uat71, uat72, uat73, uat74, uat75 | User acceptance testing |
| Integration | inter31, inter34, inter35 | Integration testing |
| Platform | plat21, plat25 | Platform-level testing |
| Sandbox | sab11, sab15 | Sandbox / experimentation |
| Search | search41-search45 | Search service testing |
| Partner | part81-part85 | Partner integration testing |
| Demo | demo01 | Demonstrations |
| Brand CUAT | hiltoncuat01, ihgcuat01, wyndcuat01, hyattcuat01, micuat01 | Brand-specific customer UAT |
| Brand Dev/QA | hiltondev1-3, hiltonqa3 | Brand-specific development |
| Production | prod01 | Live production |
-
Service config (
cicd-per-repo/{service}.txt) — Runtime versions, registry URLs, health checks, AppConfig settings. Applies across all environments. -
Environment files (env_files_all) — Per-environment, per-service overrides. Organized as
{env}/svcs/{tech_stack}/{service}/. The Helm deployment path reads these. -
AWS AppConfig (managed by appconfig_toolkit) — Application-level settings like database connection strings, feature flags, API keys. Supports multi-format configs (JSON, YAML, key-value) across 47 services and 4 tech stacks.
-
GitOps config (g360_env_configs) — For ArgoCD-managed environments. Image tags and deployment metadata stored as YAML.
The appconfig_toolkit is a Python CLI that manages AWS AppConfig configuration:
# Sync all service configurations to an environment
python -m appconfig_toolkit.cli sync-all --env dev55
# Sync a specific service
python -m appconfig_toolkit.cli sync-service --service payment-service-v2 --env dev55
# Detect configuration drift
python -m appconfig_toolkit.cli reconcile --env dev55
# Preview rendered configuration
python -m appconfig_toolkit.cli preview --service payment-service-v2 --env dev55Templates are Jinja2-based, stored in templates/services/{service}.appsettings.json.j2, with environment variables sourced from env_files_all.
Production sync requires the --prod flag and uses a separate AWS account with cross-account IAM roles.
Secrets are stored in AWS Secrets Manager and injected into Kubernetes pods via the Secret Provider Class (managed by aws-eks-spc Terraform module).
Nightly automated backups run via backup-secrets-nightly.yml:
- Hybrid encryption: RSA (wraps key) + AES (encrypts secrets)
- Storage: Encrypted to S3
- Restore:
restore_secrets.py - Key management: RSA keypairs generated via
generate_keys.py
CI/CD pipeline authentication uses GitHub repository and organization secrets:
ACTIONS_PIPELINE— PAT for cross-repo workflow calls- AWS credentials for ECR, EKS, and Secrets Manager access
- Datadog API keys for observability
All services ship with built-in Datadog APM:
- Helm templates inject
DD_ENV,DD_SERVICE,DD_VERSIONenvironment variables - Prometheus scraping enabled via pod annotations
- Log injection via
DD_LOGS_INJECTIONfor structured logging - APM tracing via Unix socket mount to the Datadog DaemonSet agent
- CI/CD tracing via
datadog-ciwrapping major pipeline steps
Pipeline-level observability:
datadog-github-compare.yml— Compares deployment metrics- Custom tags and measures documented in
docs/datadog-custom-tags-measures.md
Nightly backups of the entire GitHub organization via backup-github-nightly.yml, using github_org_backup.py.
| Task | Where to Go |
|---|---|
| Deploy a service to dev/qa/uat | devops_pipeline_initiators_eng → Actions → ENG | SVC |
| Deploy a web app | devops_pipeline_initiators_eng → Actions → ENG | WEBAPP | MP |
| Deploy to production | devops_pipeline_initiators_prod → Actions |
| Run database migrations | devops_pipeline_initiators_eng → Actions → ENG | SVC Auto-Migration |
| Plan/apply Terraform | devops_terraform_initiators → Actions |
| Manage GitHub repos/teams | devops_github_terraform |
| Add a new service to CI/CD | Create config in devops_pipelines/repo_artifacts/vars/cicd-per-repo/ |
| Update environment config | env_files_all |
| Manage AppConfig settings | appconfig_toolkit |
| View ArgoCD config | g360_argocd |
- devops_pipelines CLAUDE.md — CI/CD architecture deep dive
- ArgoCD GitOps Runbook
- GitOps Migration Guide
- EKS Cluster Setup
- Helm ECR Deployment Guide