Skip to content

Instantly share code, notes, and snippets.

@luckylittle
Last active October 6, 2024 19:05
Show Gist options
  • Save luckylittle/a00c382ba68967a395a43c8aec5d5a9c to your computer and use it in GitHub Desktop.
Save luckylittle/a00c382ba68967a395a43c8aec5d5a9c to your computer and use it in GitHub Desktop.
HashiCorp Vault Pro (exam notes)

HashiCorp Vault Pro (exam notes)

Exam objectives

  1. Create a working Vault server configuration given a scenario

    1a Enable and configure secret engines

    1b Practice production hardening

    1c Auto unseal Vault

    1d Implement integrated storage for open source and Enterprise Vault

    1e Enable and configure authentication methods

    1f Practice secure Vault initialization

    1g Regenerate a root token

    1h Rekey Vault and rotate encryption keys

  2. Monitor a Vault environment

    2a Monitor and understand Vault telemetry

    2b Monitor and understand Vault audit logs

    2c Monitor and understand Vault operational logs

  3. Employ the Vault security model

    3a Describe secure introduction of Vault clients

    3b Describe the security implications of running Vault in Kubernetes

  4. Build fault-tolerant Vault environments

    4a Configure a highly available (HA) cluster

    4b [Vault Enterprise] Enable and configure disaster recovery (DR) replication

    4c [Vault Enterprise] Promote a secondary cluster

  5. Understand the hardware security module (HSM) integration

    5a [Vault Enterprise] Describe the benefits of auto unsealing with HSM

    5b [Vault Enterprise] Describe the benefits and use cases of seal wrap (PKCS#11)

  6. Scale Vault for performance

    6a Use batch tokens

    6b [Vault Enterprise] Describe the use cases of performance standby nodes

    6c [Vault Enterprise] Enable and configure performance replication

    6d [Vault Enterprise] Create a paths filter

  7. Configure access control

    7a Interpret Vault identity entities and groups

    7b Write, deploy, and troubleshoot ACL policies

    7c [Vault Enterprise] Understand Sentinel policies

    7d [Vault Enterprise] Define control groups and describe their basic workflow

    7e [Vault Enterprise] Describe and interpret multi-tenancy with namespaces

  8. Configure Vault Agent

    8a Securely configure auto-auth and token sink

    8b Configure templating


1 Create a working Vault server configuration given a scenario

1a Enable and configure secret engines

  • Focus on generic secret engines (SE):
    • cubbyhole enabled by default,
    • KV (v1, kv-v2),
    • database (13+ different db plugins),
    • identity,
    • transit (can be used for transit auto-unseal),
    • PKI (x.509)
  • enabled and isolated at a unique path
  • interactions with SE are done using the path
  • CLI: v secrets disable/enable/list/move/tune <type>:
v secrets enable aws
v secrets tune -default-lease-ttl=72h pki/
# Good for checking KVv1 KVv2:
v secrets list --detailed
v secrets enable -path=developers -description="my first kv" kv

Cubbyhole

  • to store arbitrary secrets, cannot be moved or disabled or enabled multiple times
  • cubby/cubbyhole is where kids put their backpack, it has their name on it and it is only for them (each token gets it's own cubby)
  • it's lifetime is linked to the token used to write the data (no TTL)
  • even root cannot read this data (if it wasn't written by the root)!
  • important for "response wrapping" (user who does not have access to secrets can ask user who has access to secrets who can securely send it to him)
# Write data to cubbyhole via API
curl --header "X-Vault-Token: hvs.QRx4pz2RIka7RhhrjiVRBNjq" --request POST --data '{"certification": "hcvop"}' https://vault.training.com:8200/v1/cubbyhole/training
# Read secret from cubbyhole via API
curl --header "X-Vault-Token: hvs.QRx4pz2RIka7RhhrjiVRBNjq" https://vault.training.com:8200/v1/cubbyhole/training
# Wrapping the secret, get the wrapping_token from the response (single use, not renewable)
v kv get -wrap-ttl=5m secrets/certification/hcvop
# Unwrap
v unwrap <wrapping-token>

KV SE

  • KVv2 make sure you add data/ & metadata/ prefix in your API calls (/v1/kvv2/data/) and when writing policies (in the path)
# v1 & v2
v kv put/get/delete/list
v kv put kv/app/db pass=123 # written to kv/app/db
v kv put kv/app/db @secrets.json
# v2 only
v kv undelete/destroy/patch/rollback
v kv put kv/app/db pass=123 # written to kv/data/app/db
v kv rollback -version=1 kv/app/db
v kv patch kv/app/db pass=456
v kv get -version=3 -format=json kv/app/db
v kv get metadata get kvv2/apps/circleci
v kv metadata put -custom-metadata=abc=123 kvv2/apps/circleci
v kv metadata delete kvv2/apps/circleci

Database SE

  1. Configure Vault with access to the database
v write database/config/prod_database \
  plugin_name=mysql-database-plugin \
  connection_url="{{username}}:{{password}}@tcp(prod.hcvop.com:3306)/" \
  allowed_roles="app-integration, app-hcvop" \
  username="vault-admin" \
  password="vneJ4908fkd3084Bmrk39fmslslf#e&349"
  1. Configure roles based on permission required
v write database/roles/app-hcvop \
  db_name=prod_database \
  creation_statements="CREATE USER '{{name}}'@'%' IDENTIFIED BY '{{password}}';GRANT SELECT ON *.* TO '{{name}}'@'%';" \
  default_ttl="1h" \
  max_ttl="24h"
  1. Rotate root credentials
v write -f database/rotate-root/prod_database
  1. Request/generate new credentials
v read database/creds/app-hcvop # needs "read" capabilities to the same path
  1. Revoke
v lease revoke -prefix database/creds/app-hcvop

Identity SE

  • Vault creates an entity and attaches an alias to it if a corresponding entity doesn't already exist
  • Entity = single person or system that logged into Vault, unique value, made up of >0 aliases, consodilation of logins
  • e.g. Julie Smith entity = LDAP, userpass, GitHub aliases = inherits capabilites of all these policies
  • Group = can contain multiple entities, subgroups, policies can be set on the group
    • internal - frequently used when using Vault namespaces to propagate permissions down to child namespaces
    • external - based on group membership from an external ID provider (LDAP, Okta, OIDC), group name must match the group name in ID provider
# The below process is much simpler in the Web UI:
v policy list
v policy read kv-policy # only read kv/data/automation
v policy read manager # only read kv/data/operations/*
v write auth/userpass/users/bryan password=bryan policies=kv-policy
v login -method=userpass username=bryan # only kv-policy will be assigned
v write identity/entity name="Bryan Krausen" policies=manager # it returns id
v write identity/entity-alias name="bryan" canonical_id="<id_from_entity>" mount_accessor="auth_userpass_0479382c"
v login -method=userpass username=bryan # kv-policy, manager will be assigned

Transit SE

  • encrypt/decrypt data, application never has access to the encryption key
  • can also provide auto-unseal capabilities to other Vault clusters
  • you can limit what version(s) of keys can be used, create, rotate, delete and even export the key
  • easily rewrap ciphertext with a newer version of a key (you can limit the "minimum key version" allowed to be used for decryption)
  • default encryption key type: aes256-gcm96
  • convergen encryption mode = every time you encrypt the same data = same ciphertext back every time (enables to have searchable ciphertext)
  • all plaintext must be b64
v secrets enable transit
v write -f transit/keys/training  # create encryption key named "training"
v write -f transit/keys/training type="rsa-4096"
v write transit/encrypt/training plaintext=$(base64 <<< "Getting Started with HashiCorp Vault")
v write transit/decrypt/training ciphertext="vault:v1:Fpyph6C7r5MUILiEiFhCoJBxelQbsGeEahal5LhDPSoN6HkTO..."
v write -f transit/keys/training/rotate
v read transit/keys/training  # shows latest_version, min_encryption_version, min_decryption_version
v write transit/keys/training/config min_decryption_version=4 # keys will now only have map[4:167962305]
# Permit Encryption Operations
path "transit/encrypt/training" {
  capabilities = ["update"]
}
# Permit Decryption Operations
path "transit/decrypt/training" {
  capabilities = ["update"]
}
v write transit/rewrap/training ciphertext="vault:v1:Fpyph6C7r5MUILiEiFhCoJBxelQbsGeEahal5LhDPSoN6HkTO..."  # returned ciphertext will be newer key_version

PKI SE

  • use for internal certs
  • can only have one CA cert per PKI SE (if you need to isse certs from multiple CAs, use different PKI paths)
  • allows to have shorter TTLs, because certs are now ephemeral, eliminates revocations
  • apps can obtain certificate at runtime and discard at shutdown
  • most likely you're going to use Vault as Intermediate (your root CA is generally offline)
  • most corps already have an existing CA structure in place
  • using multiple PKI secrets engines, Vault can perform the root (but best practice is to have that offline) and the intermediate(s) CA functionality from the same cluster
  • role is 1:1 policy between a policy and a configuration in a SE (for each unique certificate type or configuration, you'll need to create a unique role)
  • Notable configurations include: allowed_domains, allow_bare_domains, allow_subdomains, allow_glob_domains etc.
v secrets enable -path=hcvop_int pki
v secrets tune -max-lease-ttl=720h pki
v secrets tune -max-lease-ttl=8760h hcvop_int
# Generate an Intermediate CSR
v write -format=json hcvop_int/intermediate/generate/internal common_name="hcvop.com Intermediate" | jq -r '.data.csr' > pki_intermediate.csr
# At this point, you need to sign the intermediate with the root CA
# Import signed certificate from the root
v write hcvop_int/intermediate/set-signed [email protected]
v read -field=default hcvop_int/config/issuers
# These are URLs that certificates will contain, and hosts will use to validate whether the certs are valid or revoked
v write pki/config/urls issuing_certificates="https://vault.hcvop.com:8200/v1/pki/ca" crl_distribution_points="https://vault.hcvop.com:8200/v1/pki/crl"
# Create a new web DMZ role
v write pki/roles/web_dmz_role allowed_domains=dmz.hcvop.com allow_subdomains=true allow_bare_domains=false max_ttl=720h allow_localhost=true organization=hcvop country=us
# Generate a new certificate for web_dmz_role (private_key is only visible once so save it!)
v write pki/issue/web_dmz_role common_name=dmzhcp01.dmz.hcvop.com alt_names=portal.dmz.hcvop.com max_ttl=720h
# Revoke a single certificate before the TTL expires
v write pki/revoke serial_number="4d:00:01:30:20:2c:5e:31:ba:a9:7b"
# Keep the storage backend clean by periodically removing certs
v write pki/tidy tidy_cert_store=true tidy_revoked_certs=true

1b Practice production hardening

  • the fewer shared resources, the better (e.g. containerization is fine, but dedicated cluster is preferred)
  • think "single tenancy" where possible, you also should not hav other services contending the resources
  • reduce/eliminate direct access to Vault nodes (ssh, RDP, kubectl exec -it vault etc.)
  • encryption keys are stored in memory
  • exceptions: telemetry, log file agents
  • permit only the required ports to reduce attack surface
  • Default ports include:
    • Vault: 8200 (API), 8201 (replication)
    • Consul: 8500, 8201, 8301
  • prefer immutable upgrades (bring new nodes online, destroy the old nodes) guarantee a known state because you know the result of your automation configurations
  • never run Vault as root, create a user named e.g. "vault"
  • limit access to config files and folder to this new Vault user (will need access to write local files/db/audit/snapshots)
  • set permissions on Vault folders to e.g. chmod 740 -R
  • protect storage backend (Consul: use Consul ACLs, limit access to any Consul node, enable verify_server_hostname)
  • disable history in Shell (or at least just 'vault' commands history)
  • turn on SELinux/AppArmor
  • turn off core dumps (could reveal encryption keys)
  • protect and audit the vault.service file (could point to compromised binaries to leak data)
  • patch OS frequently
  • disable Swap (never should be written to disk), example is enabling mlock to prevent memory swap
  • do not use tls_disable, do not use Load Balancers with terminate TLS - instead use pass through to the Vault nodes
  • secure Consul, configure gossip encryption (use consul keygen and -encrypt because TLS only secures the interface)
  • use multiple Audit Devices to log all interactions (collection servers, archive log data)
  • create alerts for the following events:
    • Use of a root token
    • Seal Status of Vault
    • Creation of a new root token
    • Audit Log Failures
    • Vault policy modification
    • Resource Quota Violations
    • Enabling a new auth method
    • Updates to Vault Policies
    • Modification of an auth method role
    • Transit Key Deletion
    • Creation of a new auth method role
    • Cloud-based resource changes
    • Permission denied (403) responses
    • Use of Vault by human-related accounts outside of regular business hours
    • Vault requests originating from unrecognized subnets
    • Transit Minimum Decryption Version Config
  • say no to cleartext credentials (use ENVs where supported, use cloud-integrated services such as IAM, Azure service identities)
  • upgrade Vault frequently
  • stop using root tokens (they have no TTL and are unrestricted, very dangerous)
  • get rid of the initial root token after initial setup (v token revoke <root-token>)
  • verify identity/checksums of Vault binaries
  • disable UI if not using it (ui = false)
  • secure unseal/recovery keys (initialize Vault using PGP keys such as keybase.io), distribute to multiple team members, no single entity should have access to all keys
  • minimize the TTLs for leases and token (smallest as possible)
  • define Max TTLs to prevent renewals beyond reasonable timeframe
  • minimizing TTL also helps reduce burden on the storage backend
  • follow principle of least privilege (only give tokens the path they absolutely require, limit use of * and + in policies, where possible)
  • separate policies for applications and users
  • perform regular backups, use snapshots, regularly test backups - e.g.:
# Opensource Vault
v operator raft snapshot save monday_20230327.snap
# Enterprise Vault automatic snapshots
v write sys/storage/raft/snapshot-auto/config/daily
  • use your existing IdP to provide access to users, do not mirror/manage your users separately

1c Auto unseal Vault

  • rather than use shared unseal keys

  • auto unseal can use a trusted cloud-based key to protect the master key

  • e.g. AWS KMS (cloud-based key) -> Master key (typically split into key shards) -> Encryption Key (DEK) -> Storage Backend (Encrypted Data)

  • supported Services include:

    • AWS KMS
    • Azure Key Vault
    • GCP Cloud KMS
    • AliCloud KMS
    • OCI KMS
    • HSM (Vault Enterprise Only)
    • Transit SE of another Vault
  • Vault does not write anything to the cloud-based service, it just uses the cloud-based key to decrypt the master key ONLY during the unseal process

  • the new functionality in latest Vault called "health check" can determine if Vault loses access to the key every 10 minutes

  • support rotation for cloud-based keys

  • keep in mind KMS services are usually regional, might afect high availability of the service

  • fail-safe practice is to create a key otuside of the cloud-based service and import to a multiple regions

  • example of enabling the auto unseal in AWS:

    • seal "awskms" {
        region      = ""
        kms_key_id  = ""
        endpoint    = ""
      }
    • set the AWS crednetials via IAM role (needs kms:Encrypt, kms:Decrypt, and kms:DescribeKey)
  • example of enabling the auto unseal in Azure:

    • seal "azurekeyvault" {
        vault_name  = ""
        key_name    = ""
      }
    • use Managed Service Identities instead of putting the tenant_id, client_id and client_secret in plaintext in your config
  • example of enabling the auto unseal in Google Cloud:

    • seal "gcpckms" {
        project     = ""
        region      = ""
        key_ring    = ""
        crypto_key  = ""
      }
    • set the instance's service account with Cloud KMS role
  • example of auto unseal with Vault Transit

    • seal "transit" {
        address         = "https://vault.hcvop.com:8200"
        key_name        = "auto-unseal-key"
        mount_path      = "/transit"
        tls_ca_cert     = "/etc/vault/ca.pem"
        tls_client_cert = "/etc/vault/client.pem"
        tls_client_key  = "/etc/vault/key.pem"
      }
    • use environment variable VAULT_TOKEN instead of writing token in plaintext to the config
    • the token needs update capability for transit/encrypt/<key> and transit/decrypt/<key>
    • the process will look like this:
    # On the first node (providing autounseal transit key)
      v login <token>
      v write -f transit/keys/autounseal
      v read transit/keys/autounseal
      v policy read unseal-policy
          # path "transit/encrypt/autounseal" {
          #   capabilities = ["update"]
          # }
          # path "transit/decrypt/autounseal" {
          #   capabilities = ["update"]
          # }
          # path "transit/keys" {
          #   capabilities = ["list"]
          # }
          # path "transit/keys/autounseal" {
          #   capabilities = ["read"]
          # }
      v token create -policy=unseal-policy -ttl=24h
    
    # On the second node (using the key from the first node)
      cat /etc/vault.d/vault.hcl
      # seal "transit" {
      #   address = "http://first_node:8200"
      #   token = "<enter token with unseal policy here>"
      #   mount_path = "transit/"
      #   key_name = "autounseal"
      #   tls_skip_verify = "true"
      # }
      systemctl restart vault
      v status            # it will be initialized=false, sealed=true
      v operator init     # now it will be initialized and usealed
  • seal migration Shamir to auto-unseal (requires short downtime!):

    1. add seal stanza to the config
    2. systemctl restart vault
    3. v operator unseal -migrate and it will ask for unseal keys X times (note: Recovery Seal Type will still say shamir)
    4. perform this operation on all standby nodes and run v operator step-down on the current leader node
    5. perform unseal migration on the last node (previous leader)
  • auto-unseal to Shamir:

    1. add disabled = true in your seal stanza config
    2. restart Vault service
    3. v operator unseal -migrate and provide RECOVERY keys
  • auto-unseal to auto-unseal (e.g. AWS KMS to Azure):

    1. update the original seal stanza to disabled = true
    2. add the new seal stanza to the config
    3. restart Vault service
    4. v operator unseal -migrate and provide RECOVERY keys

1d Implement integrated storage for open source and Enterprise Vault

  • reduced complexity (similar architecture to Consul, fewer networking requirements, not memory-bound, no network hops)
  • decreased costs
  • easier to troubleshoot
  • uses Raft protocol, HA & durable backend that does not rely on external systems
  • data is stored locally on each node and replicated to all other nodes for HA
  • recommended to use high IOPS volumes
  • features:
    • replication
    • auto snapshots - schedule to save to e.g. AWS S3
    • cloud autojoin - discover Vault nodes based on cloud-based tags
    • autopilot - increase operational efficiency (day 2)
  • example of the config stanza for integrated storage:
storage "raft" {
  path        = "/opt/vault/data"           # local directory
  node_id     = "vault-node-a.hcvop.com"    # name of the local node
  # retry_join = 10.1.100.135, 10.1.100.136 # example of automatic join based on IPs
  retry_join {                              # find the other nodes with these AWS tags
    auto_join = "provider=aws region=us-east-1 tag_key=vault tag_value=us-east-1"
    # leader_api_addr                       # static, potential leader in the cluster (DNS, IP), needs more retry_join blocks for each
  }
  performance_multiplier = 1                # detect leadership failures (5 = dev, 1 = prod)
}
  • interact with integrated storage using CLI:
# Joins a node to the raft cluster, specify the leader
v operator raft join https://vault-0.hcvop.com:8200
# Remove a specific node from a cluster (name of the node to be removed)
v operator raft leave vault-4
# Returns the set of raft peers, shows leader & follower
v operator raft list-peers
# Removes a node from the raft cluster
v operator raft remove-peer
# Force local node to step down as a leader
v operator step-down
# Restores and saves snapshots from the raft cluster
v operator raft snapshot save temp.snap
v operator raft snapshot restore temp.snap
# Autosnapshot (Enterprise only) to local folder
v write sys/storage/raft/snapshot-auto/config/hourly internal=1h retain=24 storage_type=local path_prefix=/opt/vault local_max_space=100
v read sys/storage/raft/snapshot-auto/config/hourly
# Autosnapshot (Enterprise only) to AWS S3
v write sys/storage/raft/snapshot-auto/config/cloud-daily internal=24h retain=24 storage_type=aws-s3 aws_s3_bucket=my-snapshot-bucket aws_s3_region=us-east-1
v read sys/storage/raft/snapshot-auto/config/cloud-daily

1e Enable and configure authentication methods

  • responsible for assigning identity and policies to a user and/or software (human vs system)
  • once authenticated, Vault will issue a client token used to make all subsequent Vault requests (read/write) as long as TTL allows
  • most likely AM on the exam: AppRole, Userpass, Token
  • token AM is enabled by default
  • only method of auth for new Vault is root token
  • UI isn't fully featured like the CLI and API, so there might be things you can't do in the UI
v auth help TYPE
# Enabling via CLI
v auth enable -path=hcvop approle
# Enabling via API
curl --header "X-Vault-Token: s.v2otcpHygZHWiD7BQ7P5aJjL" --request POST --data '{"type": "approle"}' https://vault.hcvop.com:8200/v1/sys/auth/approle
# Disabling
v auth disable hcvop
v auth list

AppRole

  • username/password for machines
  • role = 1:1 mapping between client authentication and the Vault permission requirements
  • each role has a static role-id and can have zero or many secret-ids that can be generated and used for auth
  • example: a fleet of web servers requiring identical permissions in Vault can all use the same role-id, but each have a unique secret-id
  • some configuration parameters for an AppRole role:
    • set the TTL of the resulting token: token_ttl=1h, token_max_ttl=24h
    • set the TTL of the secret-id that you have generated (it does not expire by default): secret_id_ttl=24h
    • configure CIDR restrictions for a role (binds the resulting token as well): token_bound_cidrs="10.1.16.0/16"
    • change the resulting token type to a 'batch' token ('service' token is the default): token_type=batch
# Use the 'auth' prefix to create a new role named 'hcvop-role' in the 'hcvop' AppRole path
v write auth/hcvop/role/hcvop-role token_policies=web-app token_ttl=1h token_max_ttl=24h
# List the roles
v list auth/hcvop/role
# Read the current configuration of the role named 'hcvop-role'
v read auth/hcvop/role/hcvop-role
# You can run it again to re-configure the auth method
v write auth/hcvop/role/hcvop-role secret_id_ttl=10m token_num_uses=10 token_ttl=20m token_max_ttl=30m secret_id_num_uses=40
# Read the 'role-id' for a particular role
v read auth/hcvop/role/hcvop-role/role-id
# Generate a 'secret-id' for a particular role (every time your run this, it generates the different secret-id)
v write -f auth/hcvop/role/hcvop-role/secret-id
# Authenticate to Vault with AppRole via CLI
v write auth/hcvop/login role_id=22549d0d-147a-d6e2-fa2e-9cedd3b20977 secret_id=b30f2778-9943-f930-1683-2d31973e285f
# Authenticate to Vault with AppRole via API - Format the entire response using jq
curl --request POST --data '{"role_id":"22549d0d-...","secret_id":"b30f2778-..."}' https://vault.hcvop.com:8200/v1/auth/approle/login | jq
# Authenticate to Vault with AppRole via API - query only for the client_token using jq
curl --request POST --data '{"role_id":"22549d0d-...","secret_id":"b30f2778-..."}' https://vault.hcvop.com:8200/v1/auth/approle/login | jq -r '.auth.client_token'

Userpass

  • local username & password
  • admin to create user, provide credentials to the user, optional change password
  • some configuration parameters for an Userpass:
    • You can set the TTL of the resulting token: token_max_ttl=24h, token_ttl=1h
    • You can change the token type to be a batch token: token_type=batch
    • You can configure the token to be a use-limited token: token_num_uses=5
    • You can configure CIDR restrictions for a token: token_bound_cidrs="10.1.16.0/16"
v auth enable userpass
v auth enable -path=vault-local userpass
# Create the new user 'hcvop-engieer' and assign a policy
v write auth/userpass/users/hcvop-engineer password=cm084kjfj3@40 policies=engineering-policy token_ttl=15m token_max_ttl=8h
# List users
v list auth/userpass/users
# Read the current config of the 'hcvop-engineer' user
v read auth/userpass/users/hcvop-engineer
# Authenticate with hcvop-engineer user, it will prompt for the password and therefore not show up in history
v login -method=userpass username=hcvop-engineer
# Update/change the password
v write auth/userpass/users/hcvop-engineer/password password=xmeij9dk20je

Token

  • responsible for creating and storing tokens
  • most operations in Vault require an existing token (not all, though - e.g. sys/health)
  • service tokens are the default token type in Vault (hvs.), persisted to storage, renewed, revoked, create child
  • batch tokens are lightweight & scalable (hvb.), not persisted to storage, used for DR Replication cluster promotion as well
# Example - No Max TTL, you can renew infinite number of times
v token create -policy=hcvop -period=24h
# Example - Limited token (2x times only), but with duration
v token create -policy=hcvop -use-limit=2
# Example - Orphan, not affected by the TTL/revocation of its parent token
v token create -policy=hcvop -orphan
# Example - AppRole auth method to generate batch tokens:
v write auth/approle/role/hcvop policies="engineering" token_type="batch" token_ttl="60s"
# Example - AppRole auth method to generate periodic tokens
vault write auth/approle/role/hcvop policies="hcvop" period="72h"
# Example - API with X-Vault-Token
curl --header "X-Vault-Token: hvs.cDIPyitdJKSm46ydTXJOsaQR" --request POST --data '{ "apikey": "3230sc$832d" }' https://vault.hcvop.com:8200/v1/secret/apikey/splunk
# Example - API with Authorization Bearer
curl --header "Authorization: Bearer hvs.cDIPyitdJKSm46ydTXJOsaQR" --request GET https://vault.hcvop.com:8200/v1/secret/data/apikey/splunk
# Example - CLI login
v login # will be prompted for token
v login <TOKEN>
# Example - Revoking, works on root token as well
v token revoke hvs.cDIPyitdJKSm46ydTXJOsaQR

# The only way to "list tokens" is via the accessors, which actually gives a list of tokens:
v list auth/token/accessors

1f Practice secure Vault initialization

  • typical process is to initialize, unseal and configure
  • Vault creates the master key and key shares + root token generation during the init
  • we don't want any one person to have access to all unseal/recovery keys so use PGP
  • PGP = private key (do not share) + public key (share)
  • there are various options to the init process:
    • -key-shares=5 - number of key shares to split the generated root key into
    • -key-threshold=3 - number of key shares required to reconstruct the root key
    • -recovery-shares=5 - number of key shares to split the recovery key into, only used in auto-unseal mode
    • -recovery-threshold=3 - number of key shares required to reconstruct the recovery key, only used in Auto Unseal mode
    • -pgp-keys=<keys> - comma-separated list of paths to files on disk containing public PGP keys
    • -recovery-pgp-keys=<keys> - behaves like -pgp-keys, but for the recovery key shares, only used in Auto Unseal mode
    • -root-token-pgp-key=<key> - path to a file on disk containing a binary or base64-encoded public PGP key
# Encrypted recovery keys will come out of the Vault after this command in the same order they were provided
v operator init -key-shares=5 -key-threshold=3 -pgp-keys="/opt/bob.pub,/opt/steve.pub,/opt/stacy.pub,/opt/katie.pub,/opt/dani.pub"
# Same, but for the autounseal, output will be base64, might ask for password during decryption (`| base64 -d | gpg -dq`) if the PGP is password protected
v operator init -recovery-shares=5 -recovery-threshold=3 -recovery-pgp-keys="/opt/bob.pub,/opt/steve.pub,/opt/stacy.pub,/opt/katie.pub,/opt/dani.pub"
# Protecting the initial root token
v operator init -key-shares=5 -key-threshold=3 -root-token-pgp-key="/opt/bryan.pub" -pgp-keys="/opt/bob.pub,/opt/steve.pub,/opt/stacy.pub,/opt/katie.pub,/opt/dani.pub"

1g Regenerate a root token

  • never use RT for day-to-day operations
  • RTs can create other RTs that do have TTL
  • once new auth method is configured and tested, revoke RT
  • what if there is a broken auth workflow (e.g. LDAP unavailable)? Quorum of key holders can regenerate a new root token
  • there are various options for generate-root command:
    • -generate-otp - Generate and print high-entropy one-time-password
    • -init - Start a root token generation
    • -decode=<string> - Decode and output the generated root token
    • -otp=<string> - OTP code to use with -decode or -init
    • -status - Print the status of the current attempt
    • -cancel - Cancel the current attempt
# A One-Time-Password (OTP) will be generated in this first step
v operator generate-root -init
# Key holders each provide their key, they all run this command
v operator generate-root
# Last person (Complete=True) will then get the Encoded Token
# To get the root token after the last person completes it
v operator generate-root -otp="<OTP_from_the_first_step>" -decode="<Encoded_token_from_the_last_person>"

1h Rekey Vault and rotate encryption keys

Rekeying is the process of generating a new root key and the unseal/recovery key shares used to reconstruct the root key. Rotating is used to change the underlying key used to encrypt/decrypt Vault data. New keys are added to the keyring and old values can still be decrypted with the old key.

Rekey Vault

  • creates a new set of Vault recovery/unseal keys
  • you need to have old keys, it does not work if you lost your keys
  • allows you to specify the number of keys and threshold during the rekey process
  • requires a threshold of keys to successfully rekey (similar to an unseal or root token generation)
  • provides a 'Nonce' value to be given to key holders
  • why would you do that?
    • one or more keys are lost (but no more than the threshold, otherwise you are screwed)
    • employees leave the organization
    • your organization requires that you rekey after a period of time as a security measurement
  • there are various options for rekey command:
    • -init - initialize the rekey process
    • -key-shares=<num> - specify the number of key shares
    • -key-threshold=<num> - specify the threshold of keys needed to reconstruct the master key
    • -nonce - pass the nonce value
    • -pgp-key=<keys> - specify the PGP keys to encrypt the generated keys
    • -status - print the status of the current rekey operation
    • -target=recovery - specify that you want recovery keys if using HSM or Auto Unseal
  • no impact/downtime to operation
# Nonce gets generated in this first step ('-target=recovery' is only for auto unseal)
v operator rekey -init -target=recovery
# Key holders each provide their existing key until you meet the threshold
v operator rekey -target=recovery
# The new generated keys will then need to be securely re-distributed to the people

# Example of Vault rekeyed with 3 key shares and a key threshold of 2:
v operator rekey -init -key-shares=3 -key-threshold=2
v operator rekey

Rotate Encryption key

  • this is used to protect data on the backend
  • this encryption key is never available to users, does not require a threshold
  • requires policy to sudo, update on sys/rotate and read on sys/key-status:
path "sys/rotate" {
  capabilities = ["update","sudo"]
}
path "sys/key-status" {
  capabilities = ["read"]
}
# Current Encryption key status
v operator key-status
# Rotate, it will show key term, install time, encryption count
v operator rotate

2 Monitor a Vault environment

2a Monitor and understand Vault telemetry

  • the collection of various runtime metrics about the performance of different components of the Vault environment
  • mostly performance monitoring and trending, but can be used for debugging
  • metrics aggregated every 10s and retained for 1m
  • telemetry information is sent to a local or remote agent which generally aggregates this information to an aggregation solution, such as DataDog or Prometheus, for example
  • supported providers:
    • statsite
    • statsd
    • circonus
    • dogstatsd
    • prometheus
    • stackdriver
  • examples of important metrics:
    • vault.core.handle_request
    • vault.runtime.total_gc_pause_ns
    • mem.used_percent
    • mem.total_bytes
    • vault.audit.log_request
    • vault.policy.get_policy
  • telemetry is configure in the Vaul config file under telemetry stanza:
telemetry {
  dogstatsd_addr = "metrics.hcvop.com:8125"
  dogstatsd_tags = ["vault_env:production"]
}

2b Monitor and understand Vault audit logs

  • not enabled by default!
  • should have more than 1 audit device
  • audit log = JSON
  • detailed log of all authenticated requests and responses to Vault
  • sensitive information is hashed out with a salt using HMAC-SHA256
  • log files should be protected as a user with permission
  • supported devices:
    • file - appends logs to the file, does not assist with rotation, use fluentd to send to a collector
    • syslog - writes to a local agent only
    • Socket - writes to a tcp, udp, or unix socket
  • Vault prioritizes safety over availability, if it cannot write to a persistent log, it will stop responding to client requests => Vault is down
  • capabilities needed on path "sys/audit/file" are read, create, list, update, delete, sudo
# Enable file audit device at default path
v audit enable file file_path="/var/log/vault_audit.log"
# Enable file audit device at custom Vault path of "logs/"
v audit enable -path=logs file file_path="/var/log/audit.log"
# Mark the audit device as a local-only device. Local devices are not replicated or removed by replication.
v audit enable -local -path=local_logs file file_path="/var/log/local_audit.log"
# View the audit devices currently enabled, shows replication status (disabled/enabled/na/replicated)
v audit list -detailed
# Disable an Audit Device
v audit disable syslog/

2c Monitor and understand Vault operational logs

  • during startup, Vault will log configuration information to the log, such as listeners & ports, logging level, storage backend, Vault version, and much more
  • we can set log level to err, warn, info (default), debug, trace
  • specify the level:
  1. Use the CLI flag -log_level when starting the Vault service: v server -config=/opt/vault/vault.hcl -log-level=debug
  2. Set the environment variable VAULT_LOG_LEVEL on each node (change takes effect after Vault server is restarted): export VAULT_LOG_LEVEL=trace
  3. Set the log_level configuration parameter in the Vault configuration file (change takes effect after Vault server is restarted): log_level=warn
  • to see the log: journalctl -b --no-pager -u vault, in the containerized environment use docker logs vault0

3 Employ the Vault security model

3a Describe secure introduction of Vault clients

  • secret zero is essentially the first secret needed to obtain other secrets
  • in Vault, it is either the auth credentials or Vault token
  • goals & best practices:
  1. Use unique credentials for each application instance provisioned
  2. Limit your exposure if a credential is compromised
  3. Stop hardcoding credentials within the application codebase
  4. Reduce the TTL of the credentials used by applications and reduce long-lived creds
  5. Distribute credentials securely and only at runtime
  6. Use a trusted platform to verify the identities of clients
  7. Employ a trusted orchestrator that is already authenticated to Vault to inject secrets

3b Describe the security implications of running Vault in Kubernetes

  • deply using the official Helm chart
  • recommendations specific to containerized environments:
    • don't offload TLS at the load balancer (ensure end-to-end encryption)
    • use TLS certificates signed by a trusted Certificate Authority (CA) & require TLS v1.2+
    • most commonly, Vault pods are scheduled to run on a separate cluster to reduce/eliminate shared resources
    • disable core dumps via RLIMIT_CORE=0 or ulimit -c 0 inside the container
    • mlock is enabled (memory lock ensures memory from a process isn't swapped to disk), needs securityContext as follows:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        capabilities:
          add: ["IPC_LOCK"]
    • if the container start as root, the processes that might escape that container will also have root on the node, so runAsNonRoot: true
    • do not run Vault as root, regardless of the platform!

4 Build fault-tolerant Vault environments

4a Configure a highly available (HA) cluster

  • ideally, we want something that provides redundancy, failure tolerance, scalability, and a fully replicated architecture
  • for Vault Enterprise, you are limited to either Integrated Storage (or Consul storage backends)
  • Multi-Node Cluster using Integrated Storage: define a local path to store replicated data
  • all data is replicated among all nodes in the cluster
  • initial configuration of Integrated Storage is done in the Vault configuration file:
storage "raft" {
  path        = "/opt/vault/data"       # the filesystem path where the data is stored, each node stores the replicated data here
  node_id     = "node-a.hcvop.com"      # identifier of the node in the cluster, must not be duplicated names
  retry_join {                          # optional, but recommended
    auto_join = "provider=aws region=us-east-1 tag_key=vault tag_value=east-1"      # this is using AWS tags to obtain the IPs
    auto_join_scheme = "http"           # by default it is HTTPS, so you only need this line when it's HTTP
  }
}
  • alternatively, you can use multiple retry_join on all of the nodes - each stanza can include DNS names or IP addresses+port:
storage "raft" {                        # this is on node-a and must list all other nodes below
  retry_join {
    leader_api_addr = "https://node-b.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-c.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-d.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-e.hcvop.com:8200"
  }
}
  • another way is to join standby nodes manually: v operator raft join https://active_node.example.com:8200
  • workflow then will be:
    1. login to node A - potential leader, run v operator init, v operator unseal
    2. login to node B, join to the node A - if you don't use auto unseal you need to manually unseal each node
    3. continue with all other nodes
    4. run v operator raft list-peers
    5. when removing the node from the cluster, make sure you always run v operator raft remove-peer node-e

4b [Vault Enterprise] Enable and configure disaster recovery (DR) replication

  • replication is only available in Vault Enterprise
  • primary cluster (system of record) -> secondary cluster (data replicated asynchronously)
  • 2 types of replication:
    • performance - does not replicate tokens or leases to performance secondaries (secondary can read, but not write)
    • disaster recovery - secondary cluster does not provide reads or writes, it is just a stand-by and does not service clients, but it does replicate tokens and leases
  • provides a warm-standby cluster where EVERYTHING is replicated to the DR secondary cluster(s)
  • DR clusters DO NOT respond to clients unless they are promoted to a primary cluster
  • even as an admin or using a root token, most paths on a secondary cluster are disabled, meaning you can't do much of anything on a DR cluster
  • if using DNS, each cluster must be able to resolve the name of the other cluster
  • port requirements: tcp/8200 (cluster bootstrapping), tcp/8201 to transfer data between the primary and DR cluster
  • workflow:
    1. Activate DR Replication on the Primary as a DR Primary
    2. Create a secondary token on the Primary cluster
    3. Activate DR Replication on the Secondary cluster as a DR secondary
    4. Watch Vault replicated the data from the Primary to the new Secondary cluster
  • replication is NOT enabled by default, so you must enable it on each cluster that will participate in the replica set (all clusters with the same configuration)
  • when activating DR replication, Vault enables an internal root CA on the primary Vault cluster - creates a root certificate and client cert
  • Vault creates a mutual TLS connection between the nodes using self-signed certificates and keys from the internal CA - NOT the same TLS configured for the listener
  • secondary token is required to permit a secondary cluster to replicate from the primary cluster (protected with response wrapping)
  • as soon as you enable replication on the secondary cluster, it will wipe all of it's current data
# Primary cluster
v read sys/license/status
v write -f sys/replication/dr/primary/enable
v write -f sys/replication/dr/primary/secondary-token id=<id_can_be_anything_e.g._us-east2-dr>
# Secondary cluster
v write -f sys/replication/dr/secondary/enable token=<token_from_primary_cluster_above>
# To monitor/check ANY replication
v read -format=json sys/replication/status
# To monitor/check DR replication
v read -format=json sys/replication/dr/status
# To monitor/check PERFORMANCE replication
v read -format=json sys/replication/performance/status

4c [Vault Enterprise] Promote a secondary cluster

  • promote a secondary DR to primary requires a DR operation token
  • this is generated directly on the DR cluster using the unseal/recovery keys
  • alternatively, you can create a DR Operation Batch Token on the primary BEFORE the failure
  • once you have token, you can use it to promote the cluster: vault write sys/replication/dr/secondary/promote dr_operation_token=XXXX
# Demote the primary cluster to replication secondary
v write -f sys/replication/dr/primary/demote
# Start the Process, get OTP on the secondary
v operator generate-root -dr-token -init
# Everyone provides the unseal/recovery key
v operator generate-root -dr-token
# Last person will recieve 'Encoded Token'
v operator generate-root -dr-token -decode="<ENCODED_TOKEN>" -otp="<OTP>"
# Promote cluster to a new replication primary
v write sys/replication/dr/secondary/promote dr_operation_token="<DECODED_TOKEN>"

5 Understand the hardware security module (HSM) integration

5a [Vault Enterprise] Describe the benefits of auto unsealing with HSM

  • an HSM (network based hardware security module) is a network-based physical device that can safeguard and manage digital keys
  • these keys can be used for encryption and decryption functions, digital signatures, strong authentication, or other functions
  • HSMs commonly have tamper resistance - meaning that detection of tampering could invoke a response such as deleting the keys so nobody can access them
  • Vault Enterprise integrations with HSM:
    • requires HSM that supports PKCS#11 standard
    • protect root/master key by using HSM to encrypt/decrypt root key
    • auto unseal Vault by storing wrapped key on local storage
    • seal wrapping to provide extra layer of protection for FIPS 140-2 compliance
    • entropy Augmentation to generate randomness for cryptographic operations
  • example of the HSM auto unseal configuration:
seal "pkcs11" {
  lib             = "/usr/vault/lib/libCryptoki2_64.so"
  slot            = "2305843009213693953"
  pin             = "AAAA-BBBB-CCCC-DDDD"
  key_label       = "vault-hsm-key"
  hmac_key_label  = "vault-hsm-hmac-key"
}

5b [Vault Enterprise] Describe the benefits and use cases of seal wrap (PKCS#11)

  • "seal wrapping" (SW) essentially provides "double encryption" by encrypting the data using keys stored on an HSM
  • provides FIPS 140-2 compliance by integrating with an HSM
  • allows Vault to be deployed in high-security GRC environments (PCI, HIPAA, DoD, NATO)
  • SW is enabled by default on the supported seals (can be disabled with disable_sealwrap=true in the config):
    • recovery key
    • any stored key shares
    • root key
    • keyring
  • other backend mounts can have SW enabled per SE with seal_wrap=true configuration:
# Enable a secrets engine with SW
v secrets enable -seal-wrap kv
# List the enabled secrets engines, including SW column (true vs false)
v secrets list -detailed

6 Scale Vault for performance

6a Use batch tokens

  • Difference between ST (hvs.) & BT (hvb.) tokens:
Service Tokens Batch Tokens
Can be root tokens Yes No
Can create child tokens Yes No
Renewable Yes No
Listable Yes No
Manually Revocable Yes No
Can be periodic Yes No
Can have explicit Max TTL Yes No (always uses a fixed TTL)
Has accessors Yes No
Has Cubbyhole Yes No
Revoked with parent (if not orphan) Yes Stops Working
Dynamic secrets lease assignment Self Parent (if not orphan)
Can be used acroos Performance Replication clusters No Yes (if orphan)
Creation scales with performance standby node count No Yes
Cost Heavyweight Lightweight

What makes BT tokens special?

  • can be used for DR Replication cluster promotion as well
  • includes information such as policy, TTL, and other attributes
  • batch tokens are encrypted using the barrier key, which is why they can be used across all clusters within the replica set
  • Non-orphan batch tokens do not get replicated to secondary performance cluster
  • Orphan batch tokens (it has no parent) will get replicated and valid on any secondary perf cluster
# Creating an orphan batch token, will be replicated
v token create -type=batch -orphan=true -policy=hcvop
# Through traditional auth methods
v write auth/approle/role/hcvop policies=devops token_type="batch" token_ttl="60s"

DR Operation Batch Token

  • eliminates the requirement to generate a DR operation token using the unseal/recovery keys
  • must have the proper capabilities to promote a secondary and perform related actions
# To promote a secondary
path "sys/replication/dr/secondary/promote" {
  capabilities = ["update"]
}

# To update the primary to connect
path "sys/replication/dr/secondary/update-primary" {
  capabilities = ["update"]
}

# Only if using integrated storage (raft) as the storage backend
# To read the current autopilot status
path "sys/storage/raft/autopilot/state" {
  capabilities = ["update", "read"]
}

6b [Vault Enterprise] Describe the use cases of performance standby nodes

  • Vault open source (OS): when client requests standby node, this request is forwarded to the active node => scale up
  • Vault Enterprise: when client requests standby node, it responds to a read - but if it's a write, it forwards to active node => scale out
  • What is Read? Any operation that does NOT result in a storage write to backend is considered a READ - not necessarily only HTTP GET or vault read!
  • Examples of read-only actions:
    • reading secrets stored in the Key/Value engine
    • transit Secrets Engine - Encrypt or Decrypt operations
    • sign SSH client keys
  • to scale a Vault Enterprise cluster, performance standby nodes can respond to read requests from clients rather than sending the request to the Active node
  • applications known to require reads can be directed to performance standby nodes
    • this will help offload traffic from the Active node and allow you to scale OUT your cluster
  • performance Standby nodes can still take over as an Active node to continue providing high-availability within the local cluster
  • reminder: Performance Standby functionality is a Vault Enterprise feature!
  • eventual consistency: reading directly after write may error out, give it some time to replicate everywhere
  • performance standby has health endpoint (commonly accessed by load balancers): /sys/health
  • automatically enabled when you have an Enterprise license, otherwise disable_performance_standby=true in the config

6c [Vault Enterprise] Enable and configure performance replication

  • performance replication between primary and secondary cluster(s):
    • replicates the underlying configuration, policies, and other data
    • ability to service reads from client requests
    • clients will authenticate to the performance replicated cluster separately (talking directly with secondary cluster, they don't have to talk to primary)
    • does not replicate tokens or leases to performance secondaries
    • provides active/active solution for applications running in multiple data centers
    • applications authenticate with the LOCAL Vault cluster. Tokens are created and maintained on each cluster and are not replicated via Perf Replication
    • if a cluster becomes unavailable and you failover, applications will need to reauthenticate with the new Vault cluster. Exception here is if you failover to a DR cluster
    • performance replicated clusters handle the retrieval of secrets and the generation of dynamic credentials for local clients
    • these requests are handled locally and tokens/leases not replicated to the primary cluster
    • this helps offload some WRITE operations from the primary because the local cluster handles and doesn't forward to the primary cluster
    • any request to write data to the KV, write/updates Vault policies, make Vault configuration changes, etc. WILL be forwarded to the primary
# Primary cluster
v write -f sys/replication/performance/primary/enable
v write sys/replication/performance/primary/secondary-token id=<id>
# Secondary cluster - everything on this cluster is erased and replaced by primary data!
v login
v write sys/replication/performance/secondary/enable token=<token>
# check status of performance replication, state 'stream-wals' means it completed
v read -format=json sys/replication/performance/status

6d [Vault Enterprise] Create a paths filter

  • regulatory compliances may restrict you from replicating certain data (GDPR)
  • for example what happens when customers' PIIs in Europe are replicating this to APAC?
  • Vault has a Paths Filters capability when using Performance Replication
  • for each of secondary clusters, you can configure:
    • allowlist - only the selected paths are included for replication to the secondary
    • denylist - all paths will be replicated except the selected mount paths
  • you can mark a secrets engine or auth method as local so it is not replicated or removed by replication configurations (stays local to the particular cluster and will not be replicated)
v secrets enable -local -path=apac kv-v2
# The Secondary Cluster ID is us-east-dr, Mode = allow or deny, Paths to include in the list
v write sys/replication/performance/primary/paths-filter/us-east-dr mode=allow paths=aws/,hcvop/,customers/

7 Configure access control

7a Interpret Vault identity entities and groups

  • Vault creates an entity and attaches an alias to it if a corresponding entity doesn't already exist
  • this is done using the Identity secrets engine, which manages internal identities that are recognized by Vault
  • an entity is a representation of a single person or system used to log into Vault. Each has a unique value. Each entity is made up of zero or more aliases
  • alias is a combination of the auth method plus some identification = It is a mapping between an entity and auth method(s)
  • example - aliases:
    UserPass: jsmith    # alias
    Entity_ID: abcd-123 # entity
    Policy: accounting
    
    GitHub: jsmith22    # alias
    Entity_ID: efgh-456 # entity
    Policy: finance
    
    • an entity can be manually created to map multiple entities for a single user to provide more efficient authorization management
    • any tokens that are created for the entity inherit the capabilities that are granted by alias(es)
  • example - entity:
    Name: Julie Smith
    Entity_ID: ijkl-789
    Policy: management
    Aliases:
      - UserPass: jsmith
      - GitHub: jsmith22
    
  • example how manually create entities in CLI:
# Create a base entity, get 'entity_id':
v write identity/entity name="Julie Smith" policies="it-management" metadata="organization"="HCVOP, Inc" metadata="team"="management"
# Grab the 'auth_accessor':
v auth list
# Add GitHub auth as an alias:
v write identity/entity-alias name="jsmith22" canonical_id=<entity_id> mount_accessor=<github_auth_accessor>
# Add LDAP auth as an alias:
v write identity/entity-alias name="[email protected]" canonical_id=<entity_id> mount_accessor=<ldap_auth_accessor>

Vault Groups

  • group can contain multiple entities as its members
  • group can also have subgroups
  • policies can be set on the group and the permissions will be granted to all members of the group
  • example - group:
    Group name: Finance_team
    Policy: finance
    Members:
      - Name: Maria Shi
        Entity_ID: abcd-123
        Policy: accounts_payable
        Entity_aliases:
          - Username: maria.shi
            Policy: base-user
      - Name: John Lee
        Entity_ID: efgh-456
        Policy: management
        Entity_aliases:
          - Username: john.lee
            Policy: super-user
    # token inherits capabilities granted by alias, entity, and the group (John Lee will get finance, management, super-user)
    
  • 2 types of groups:
    • internal
      • created manually, groups created in Vault to group entities to propagate identical permissions
      • internal groups can be used to easily manage permissions for entities
      • frequently used when using Vault Namespaces to propagate permissions down to child namespaces
      • helpful when you don't want to configure an identical auth method on every single namespace
    • external
      • created manually or automatically, groups which Vault infers and creates based on group associations coming from auth methods
      • used to set permissions based on group membership from an external identity provider, such as LDAP, Okta, or OIDC provider
      • allows you to set up once in Vault and continue manage permissions in the identity provider
      • note that the group name must match the group name in your identity provider

7b Write, deploy, and troubleshoot ACL policies

  • Vault policies provide operators a way to permit or deny access to certain paths or actions within Vault (RBAC)
  • policies are written in declarative statements and can be written using HCL (or JSON)
  • when writing policies, always follow the principal of least privilege (only the permission they need)
  • policies are Deny by Default (implicit deny) - therefore you must explicitly grant to paths and related capabilities to Vault clients
  • policies support an explicit DENY that takes precedence over any other permission
  • policies are attached to a token
  • token can have multiple policies, policies are cumulative and capabilities are additive
  • out of the box policies:
    • root - v policy read root does not contain any rules and will result in an error
      • superuser with all permissions
      • cannot change nor delete this policy
      • attached to all root tokens
    • default - v policy read default - will show all the default capabilities
      • provides common permissions
      • you can change this policy, but it cannot be deleted
      • attached to all non-root tokens by default (can be removed if needed)
  • everything is path based, so policy has path and it's capabilities (list of permissions)
  • root protected paths (important/critical paths) require a root token or sudo capability
  • capabilities are specified as list of strings
    • create - if the key does not yet exist
    • read - read credentials, configurations, etc
    • update - if the key exists and you want to replace/update it
    • delete
    • list - doesn't allow you to read
    • sudo - allows access to paths that are root-protected
    • deny - always takes precedence over any other capability
  • the glob (*) is a wildcard and can only be used at the end of a path (can be used to signify anything "after" a path or as part of a pattern)
  • the plus (+) supports wildcard matching for a single directory in the path (can be used in multiple path segments)
  • you can combine * and + like e.g. path "secret/apps/+/team-*"
  • ACL templating allows to use variable replacement, define paths containing double curly braces like e.g. {{identity.entity.id}}:
Parameter Description
identity.entity.id The entity's ID
identity.entity.name The entity's name
identity.entity.metadata.<<metadata key>> Metadata associated with the entity for the given key
identity.entity.aliases.<<mount accessor>>.id Entity alias ID for the given mount
identity.entity.aliases.<<mount accessor>>.name Entity alias name for the given mount
identity.entity.aliases.<<mount accessor>>.metadata.<<metadata key>> Metadata associated with the alias for the given mount and metadata key
identity.groups.ids.<<group id>>.name The group name for the given group ID
identity.groups.names.<<group name>>.id The group ID for the given group name
identity.groups.names.<<group id>>.metadata.<<metadata key>> Metadata associated with the group for the given key
identity.groups.names.<<group name>>.metadata.<<metadata key>> Metadata associated with the group for the given key
  • administrative policies include:
    • Licensing
    • Setup New Vault Cluster
    • Configure UI
    • Rotate Keys
    • Seal Vault
  • standard CLI operations with policies:
# List all policies
v policy list
# Delete policy
v policy delete
# Format the HCL
v policy fmt
# See the policy
v policy read
# Updating and creating policy
v policy write <name> <policy.hcl>

# Managing policy inline
v policy write webapp -<< EOF
path "kv/data/apps/*" {
  capabilities = ["read","create","update","delete"]
}
path "kv/metadata/*" {
  capabilities = ["read","create","update","list"]
}
EOF

# Creating new policy via API:
curl --header "X-Vault-Token: hvs.bCEo8HFNIIR8wRGAzwUk" --request PUT --data @payload.json http://127.0.0.1:8200/v1/sys/policy/webapp

# Testing policies - e.g. must be able to request AWS credential granting read access to S3, read secrets from 'secret/apikey/Google'
v token create -policy="web-app"
v login <token> # Authenticate with the newly generated token
v read secret/apikey/Google # Make sure that the token can read
v write secret/apikey/Google key="ABCDE12345" # This should fail
v read aws/creds/s3-readonly # Request a new AWS credentials
  • example of policy #1 - requirements:
    • access to generate database credentials at database/creds/db01
    • create, update, read, and delete secrets stored at kv/apps/dev-app01
    path "database/creds/dev-db01" {
      capabilities = ["read"]
    }
    path "kv/apps/dev-app01" {
      capabilities = ["create", "read", "update", "delete"]
    }
  • example of policy #2 - requirements:
    • access to read credentials after the path kv/apps/webapp
    • deny access to kv/apps/webapp/super_secret
    # does not get access to kv/apps/webapp, only things after webapp
    # does not permit to browse kv/apps/webapp via UI, we don't have 'list'
    path "kv/apps/webapp/*" {
      capabilities = ["read"]
    }
    path "kv/apps/webapp/super_secret" {
      capabilities = ["deny"]
    }

7c [Vault Enterprise] Understand Sentinel policies

  • Sentinel is an embeddable policy as code framework to enable fine-grained, logic-based policy decisions that can be extended to source external information to make decisions
  • not just a Vault feature, available in all HashiCorp Enterprise products
  • for example time-based requests (denied access during certain time)
  • 2 types of sentinel policies:
    • Role Governing Policies (RGPs) - tied to tokens, identity entities/groups
    • Endpoint Governing Policies (EGPs) - tied to paths instead of tokens (even on unauthenticated paths)
  • 3 types of enforcement levels:
    • advisory - The policy is allowed to fail, you can see it in the logs
    • soft-mandatory - The policy must pass unless an override is specified (use the --policy-override flag)
    • hard-mandatory - The policy muss pass no matter what
  • policy evaluation / hierarchy:
    1. Request is Unauthenticated -> Evaluate EGPs on requested path, then deny or allow
    2. Request is Authenticated -> Evaluate token's ACL policies, then deny or 3.
    3. If permission is granted in ACL -> Evaluate RGPs attached to the token, then deny or 4.
    4. If permission is granted in RGPs -> Evaluate EGPs on requested path, then deny or allow
    5. root token skips all of the 1-4 steps!
  • topology of sentinel policy:
import "<library>"                          # access to reusable libraries to import information or use features
<variable> = <value>                        # optional, dynamically typed variable
<name> = rule { <condition_to_evaluate> }   # can have more than one rule
main = rule {                               # required field
  <condition_to_evaluate>                   # describe a set of conditions resulting in either true or false
}
  • different imports: base64, decimal, http, json, runtime, sockaddr, strings, time, types, units, version
  • example 1 - rgp:
main = rule {                                         # allow a specific entity or group
  identity.entity.name is "jeff" or                   # if the user "Jeff" is deleted and recreated, the match will fail because we're also enforcing the entity ID
  identity.entity.id is "fe2a5bfd-c483-9263-b0d4-f9d345efdf9f" or
  "sysops" in identity.groups.names or
  "14c0940a-5c07-4b97-81ec-0d423accb8e0" in keys(identity.groups.by-id)
}
  • example 2 - egp:
import "time"                                             # disallow all previously-generated tokens based on date (xmas)
main = rule when not request.unauthenticated {            # you could apply this EGP to the '*' endpoint
  time.load(token.creation_time).unix >
    time.load("2022-12-25T00:00:01Z").unix
}
  • example 3 - egp:
import "sockaddr"
import "mfa"
import "strings"
# We expect logins to come only from a specific private IP range
cidrcheck = rule {
  sockaddr.is_contained(request.connection.remote_addr, "10.0.23.0/16")
}
# Require Ping MFA validation to succeed
ping_valid = rule {
  mfa.methods.ping.valid
}
main = rule when request.path is "auth/ldap/login" {      # sets the scope of policy
  ping_valid and cidrcheck                                # must also pass both rules
}

7d [Vault Enterprise] Define control groups and describe their basic workflow

  • control groups (CG) add an additional authorization requirement on configured paths within Vault
  • not a lot of people are using it
  • when a CG is created (we apply CG on a specific path), the following will occur:
    1. client makes a request to a path as normal
    2. Vault returns a wrapping token & accessor - rather than the requested secrets/data directly
    3. controllers/managers/authorizers defined in the CG policy must approve/authorize this request
    4. once all authorizations are satified, the client can unwrap the secrets
  • CG requirements can be specified in either ACL policies or within a Sentinel policy
  • currently, the only supported CG factor is an Identity Group (authorizer must belong to a specific identity group)
  • policy will define the group(s), who are approvers for the requested path
  • example of CG in Vault policy:
path "kv/data/customers/orders" {
  # Regular ACL Policy
  capabilities = ["read"]
  control_group = {
    factor "acct_manager" {
      # Control Group
      identity {
        group_names = ["account-managers"]
        # We need 2 account managers to approve this request
        approvals = 2
      }
    }
  }
}
  • example of CG in Sentinel EGP (deploy this EGP against e.g. kv/data/customers/orders):
import "controlgroup"

control_group = func() {
  numAuthzs = 0
  for controlgroup.authorizations as authz {
    if "account-managers" in authz.groups.by_name {
      numAuthzs = numAuthzs + 1
    }
  }
  # we need 2 account managers to approve this request
  if numAuthzs >= 2 {
    return true
  }
  return false
}

main = rule {
  control_group()
}
  • example in CLI:
# Authenticate with a token tied to a policy with a Control Group:
v login hvs.CAESIA7Y-LwSxnE926onQwdxIUF7w7KJ5-
# Request data from KV store like usual, but got 'wrapping_token', 'wrapping_accessor':
v kv get kv/customers/orders
# Authorizer needs to go to UI: 'Access' - 'Control Groups' - 'Accessor' - 'Lookup' - 'Authorize'
# User then can go to 'Tools' - 'Unwrap' and be able to use it

7e [Vault Enterprise] Describe and interpret multi-tenancy with namespaces

  • provides isolated/segregated environments on single Vault environment (multi-tenant but centralized management)
  • allows delegation of Vault of responsibilities
  • Each namespace (NS) can have its own:
    • Policies
    • Auth Methods
    • Secrets Engines
    • Tokens
    • Identities
  • default NS is root
  • NS are hierarchical like folder structure (root can have child and that child can have another child etc)
  • paths & ACLs are relative to NS (you can have the same name KVs in multiple NS)
  • tokens are only valid in a single namespace, but you can create an entity who has access to other, child NS
  • administrative delegation - vault engineers/admins (e.g cluster nodes, audit, root NS, storage etc) vs NS admins (SEs, policies, auth methods etc)
  • authenticating to NS - directly to a child NS or to root NS
  • writing policies for NS - the path is relative to the NS, for example:
    • cloud-team NS has path of "database/creds/prod-db"
    • but if you write a policy in the root NS, you need path of "cloud-team/database/creds/prod-db"
# Create NS
v namespace create <NS>
# Create a child NS
v namespace create -namespace=hcvop <NS>
# List NS
v namespace list
# Delete a NS
v namespace delete <NS>
# Set NS Environment Variable, then run commands as normal
export VAULT_NAMESPACE=<NS>
# or reference a NS on the CLI when running a command
v kv get -namespace=<NS> kv/data/sql/prod
v auth enable -namespace=cloud-team userpass
# authenticating to a NS
v login -namespace=cloud-team -method=userpass username=bryan

# referencing NS in the API with header
curl -header "X-Vault-Token: "hvs.a83b50ed2aa548212" -header "X-Vault-Namespace: "development/" -request GET https://vault.hcvop.com:8200/v1/kv/data/sql/prod
# referencing the namespace using API path
curl -header "X-Vault-Token: "hvs.CAESIA7Y-LwSxnE926onQwdxIUF7" -request GET https://vault.hcvop.com:8200/v1/development/kv/data/sql/prod
  • example: Create a policy in the education NS that gives the users access to work with the database secrets engine. The policy should permit CRUD and list permissions to everything under the database secrets engine. The name of the policy should be database-full-access:
v policy write -namespace=education database-full-access -<< EOF
path "database/*" {
  capabilities = ["create", "update", "delete", "read", "list" ]
}
EOF
  • example: Create a policy in the root namespace that gives the users access to work with the database secrets engine in the education namespace. The policy should permit CRUD and list permissions to everything under the database secrets engine. The name of the policy should be ns-education-db-full-access:
v policy write ns-education-db-full-access -<< EOF
path "education/database/*" {
  capabilities = ["create", "update", "delete", "read", "list" ]
}
EOF

8 Configure Vault Agent

8a Securely configure auto-auth and token sink

  • Vault Agent is a client daemon that runs alongside an application to enable legacy applications to interact and consume secrets
  • provides several different features:
    • automatic authentication including renewal (azure, aws, approle, kubernetes supported)
    • secure delivery/storage of tokens (response wrapping)
    • local caching of secrets
    • templating
  • auto-auth process guarantees a valid token is always available to the application:
    1. legacy application and vault agent run alongside (sidecar)
    2. agent will authenticate against Vault so the legacy does not have to (uses a pre-defined auth method to authenticate)
    3. agent recieves a token and stores in the "sink" (plain/flat text file)
    4. legacy app reads the sink and invokes Vault API with this token
    5. agent understands TTL and also handles renewal
  • example - vault agent config file:
# Auto-auth config (AppRole)
auto_auth {
  method "approle" {
    mount_path = "auth/approle"
    # wrap_ttl = "5m" - optional, more secure, protects against MITM attacks BUT cannot renew the token
    config = {
      role_id_file_path   = "<path-to-file>"
      secret_id_file_path = "<path-to-file>"
    }
  }
# Sink config - only supported method is 'file'
  sink "file" {
    # wrap_ttl = "5m" - optional, less secure, the token stored on the system will be wrapped BUT allows renew when expired
    config = {
      path = "/etc/vault.d/token.txt"
      # mode - optional, file permissions (default = 640)
    }
  }
}
# Vault config
vault {
  address = "http://<cluster_IP>:8200"
}
  • execute the Vault Agent: v agent -config=agent.hcl
  • note: for security purposes, Vault Agent will remove the fiel containing AppRole's secret_id, unless remove_secret_id_file_after_reading = false

8b Configure templating

  • how can legacy applications still take advantage of Vault for secrets if it can't talk to Vault by itself?
  • to further extend the functionality of the Vault Agent, a subset of the CT functionality is directly embedded into the Vault Agent
  • no need to install the Consul-Template binary on the application server
  • Vault secrets can be rendered to destination file(s) using the CT markup language
  • uses the client token acquired by the auto-auth configuration
  • workflow:
    1. agent auth with Vault
    2. Vault returns client token
    3. agent stores token in sink
    4. agent read secrets based on template
    5. render the secrets to output file
    6. application reads this file and never knows about agent
  • example - templated file:
production:
  adapter: postgresql
  encoding: unicode
  database: orders
  host: postgres.hcvop.com
  {{ with secret "database/creds/readonly" }}
  username: "{{ .Data.username }}"
  password: "{{ .Data.password }}"
  {{ end }}
  • example - destination file (rendered result):
production:
  adapter: postgresql
  encoding: unicode
  database: orders
  host: postgres.hcvop.com
  username: "v-vault-readonly-fm3dfm20sm2s"
  password: "fjk39fkj49fks02k_3ks02mdz1s1"
  • template config:
# auto_auth stanza
...OMITTED...
# Global template configurations (affects all templates)
template_config {
  exit_on_retry_failure           = true
  # Vault Agent can continuously update the application's configuration file based on any changes made in Vault
  static_secret_render_interval   = "10m"
}
# Template configuration
template {
  source        = "/etc/vault/web.tmpl"
  destination   = "/etc/webapp/config.yml"
}

Appendix - Exam experience & Tips

  • mix of longer multiple-choice questions + hands-on lab to configure Vault + hybrid to read a scenario and look at vault to determine the answer
  • everything done via browser, you can access documentation
  • exam is run on containers and uses Portainer.io - you can start/stop/restart/pause/resume, you can look at the logs when you click the container

Author: @luckylittle

Last update: Wed Apr 26 03:24:11 UTC 2023

Q1: Multi-choice - Your organization is running Vault Enterprise for production workloads. Due to high workloads, your production cluster is cannot keep up with the number of reads requests. As a result, applications are displaying errors to customers and the business is at risk of losing significant revenue. How can you reconfigure the Vault cluster to provide scale-out read capabilities without taking an outage on any of the Vault cluster nodes?

a) Increase the memory for each node by changing the underlying hardware

b) Determine the secrets engine being overused and enable multiple secrets engines to spread the workload across them

c) Reconfigure the load balancer with another listener, point it to the performance standby nodes, and reconfigure read-only applications to send request to the new listener

d) Deploy another Vault cluster and configure disaster recovery replication. Redirect some clients to the new cluster.

A1: C

Q2: Hybrid - You are the Vault engineer at your organization. A user recently stated they cannot log in to Vault and interact with the proper Namespace assigned to their team. Based on the policies that are already created in Vault, determine which policy permits them to authenticate to the root namespace and interact with the namespace called mobile-team-a. Click HERE to open an SSH session to the Vault cluster node. Determine the correct policy to use and select the answer below.

a) operation-policy

b) automation-ro-policy

c) training-team-policy

d) developer-team-a-policy

A2: C

Q3: Lab - Initialize a Vault node using Integrated Storage:

  • Demonstrate you can create a Vault server configuration file
  • Know how to configure Auto Unseal
  • Set up an HA cluster
  • Initialize a Vault server using production hardening techniques

Q4: Lab - Vault Agent and Templating:

  • Set up a Vault Agent to authenticate to Vault (Vault Agent Auth-Auth)
  • Retrieve Secrets (Vault Agent Templates)

Q5: Lab - Vault Enterprise Replication:

  • Enable and Configure DR Replication and Performance Replication
  • Know how to configure a paths filter
@luckylittle
Copy link
Author

@luckylittle
Copy link
Author

The file Hashicorp Vault Pro v0.1.pdf is available here:
Dropbox
Box
GDrive
1Drive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment