Secret Management Scope

.NET Core Engineering manages a lot of secrets, and it is difficult to both manage and reason about them.

What follows is an overview of why we need a process in place for secret management, and then an overview of the scope of secret management that .NET Core engineering manages.

Overview

Why

The experience associated with managing our secrets is error-prone, it requires manual developer steps.
Conventions for secrets are loosely defined, there is no strict enforcement of the convention.
Metadata for a secret is not sufficient to identify the intent of the secret
Not every secret is accounted for, it is easy to create a secret which is not easily tracked
- We don't know where our secrets are being used, how often, or even if they are still being used at all; this means that rotating secrets can have unknown consequences
- We don't know if we have duplicated secrets

Principles

We need to be able to track our secrets:
- who is using our secrets
- where are secrets being used
- how often are secrets being used
We need to provide a uniform way for creating secrets and providing metadata around that secret
- Common metadata includes:
  - Intent
  - Last modified datetime
  - Last modified by
  - Secret type
  - Expiration datetime
We need a method for preventing or alerting if our services are accessing secrets we don't manage, and / or bad secrets.
We should be able to detect duplication of a secret with the same key vault
We do not need to know every secret that a particular service uses, but we do need to be able to automatically rotate any/all secrets we manage
Secret consumers need to have live access to required secrets
We need to be able to validate, at any moment, that specific (or all) secrets we manage are valid
Do not design a system that prevents management of new secret types without significant overhead
We need to be able to manage (rotate, delete, expire) any secret we own using a common system

Scope

What

There are a large variety of types of secrets that we manage in .NET Core Engineering. We do not intend to move every secret we own to the new model, but we will move the secrets related to the primary services we own (helix services, arcade services, etc...) and we will not preclude managing other secrets. Below, are the primary types of secrets that we manage.

Automatic Rotation

These secret types can be rotated by automation.

Azure Storage connection string or sas token Easily rotated using azure apis. Metadata needs to include what account/resource and requires permissions.
Service Bus connection string Easily rotated using azure apis. Need metadata for namespace and required permissions.
Event Hub connection string Easily rotated using azure apis. Need metadata for namespace and required permissions.
SQL Database connection string Can be rotated assuming the rotation service can get permissions. Metadata needs to include database server/name and required permissions.
Random base64 "key" The only instance of this is a key used to encode job cancellation tokens, this can be rotated very easily.

Manual Rotation

These secrets cannot be rotated without human intervention.

Kusto connection string This is just a service principal secret, we should probably use MSI for this.
Azure Devops access token Metadata should include required organizations and scopes for the token.
Github access token Metadata needs to include required scopes, and accessible repos
Maestro/Helix access tokens Standard Token
Github app secrets These can't be rotated without disruption
Domain account password Metadata should say what account it is.

Who

Who uses our secrets? For this document, the users of our secrets (actors) are defined as those entities that perform "actions" associated with our secrets.

Actors

First responders - responsible for rotating secrets
.NET Core Engineering team - create secrets for use in .NET Core Engineering services
.NET Core teams / partners - create secrets that are used in AzDo builds
.NET Core Engineering services - consume secrets created by other actors
AzDo builds - consume secrets created by other actors

Actions

Create - create a secret, store secret in Key Vault, add secret metadata
Manage - rotate a secret, expire a secret, delete a secret
- rotation (typically) involves changing a secret value in an application, and then modifying the value in key vault so that the correct value is associated with the secret
Use - use a secret in an application, or via key vault access

When

The model for when secrets will be rotated is TBD. ie, are we rotating secrets using automation in an on-demand fashion? regular scheduled rotation? Other?

chcosta/gist:1287b33f8379873969cd9f81dc89de28