#Flapping_Policies_ACM2.7_Troubleshooting.md

With an ACM mustgather of the hub, navigate into a managedcluster namespace, then run this script to identify possibly affected policies:

for file in *.yaml; do
  ts_ten="$(yq e '.status.details[].history[9].lastTimestamp | select(. != null)' ${file})"
  if [[ -n "${ts_ten}" ]]; then
    echo "${ts_ten} - ${file}"
  fi
done

Any recent timestamps (relative to when the mustgather was collected) should be investigated.

First check the status in that policy, to identify which ConfigurationPolicy is flapping (if there are multiple defined in the Policy). Then examine the template to see if it matches any of the known issues listed below.

Some of these issues are already fixed in some versions of ACM. Many of these issues affected clusters to some extent before, but were not noticed until ACM 2.7, when the flapping events became more prominent.

General: Typos

Incorrect fields defined in the object might be dropped quietly when the resource is created or updated. Misspellings and/or incorrect indentation are common causes. This also seems to be more common when working with custom resources.

Solution: Identify the problematic fields by comparing what is returned by the k8s API server (from kubectl get ... -o yaml) and what is defined in the Policy. Fix the typos in the policy.

General: NamespaceSelector targetting multiple namespaces

I'm not completely sure what triggers this. The policy may or may not be "flapping" between Compliant and NonCompliant, but it is constantly evaluating and sending updates (so it will be flagged by the script above in this gist), which can clutter up the history and potentially cause other issues.

Solution: If possible, create separate ConfigruationPolicies for each namespace. As of 2023-06-21, there is not a fix for this in any released version.

PodSecurityPolicy/SecurityContextConstraint

(This issue can occur on other kinds, but these have been most common in my experience.)

Some fields set in the policy may be omitted when returned by the k8s API server. This can easily occur with booleans set to false, or lists that are empty. For example, if the policy for a SecurityContextConstraint sets allowedFlexVolumes: [], or the policy for a PodSecurityPolicy sets hostPID: false.

Solution: Identify the problematic fields by comparing what is returned by the k8s API server (from kubectl get ... -o yaml) and what is defined in the Policy. Remove fields from the policy that are not being returned by the API server.

Secrets using `stringData`

The policy might define the Secret using stringData, but when it checks if what it "wants" matches what is on the API server, the Secret it gets will only have data.

Solution: Use data, and the template function base64enc.

For example, instead of:

stringData:
  foo: bar

use:

data:
  foo: '{{ printf "bar" | base64enc }}'

In more complex cases, more complex template trickery may be required. For example instead of:

object-templates:
  - complianceType: musthave
    objectDefinition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: my-secret
        namespace: default
      stringData:
        file.yaml: |
          foo: bar
          username: {{ printf "{{hub fromSecret "policies" "source" "user" hub}}" | base64dec }}
          password: {{ fromSecret "default" "source" "pass" | base64dec }}

use:

object-templates:
  - complianceType: musthave
    objectDefinition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: my-secret
        namespace: default
      data:
        file.yaml: |-
          {{printf `
          foo: bar
          username: %s
          password: %s`
          (printf "{{hub fromSecret "policies" "source" "user" hub}}" | base64dec)
          (fromSecret "default" "source" "pass" | base64dec)
          | trim | replace `
          ` "\n" | printf "%s\n" | base64enc}}

The example above ensures that the file will end with exactly 1 newline. The trim step adds some "forgiveness" to the template, so additional newlines in the "main" part can be used to separate it more cleanly from the pieces that will be injected, and allows for different YAML editors to add spaces but not change the final output (I'm looking at you, ACM UI editor).

Secrets using `data` with a newline

The kubernetes API will remove the trailing newline. The YAML encoding can make it tricky to see the newline, so this example uses JSON-style encoding in the important part (a nice reminder that JSON is a subset of YAML). Applying this yaml:

apiVersion: v1
kind: Secret
metadata:
  name: example
data: {"foo":"ZW5jb2RlIG1lIQ==\n"}

results in this Secret being created on the cluster:

apiVersion: v1
kind: Secret
metadata:
  name: example
data: {"foo":"ZW5jb2RlIG1lIQ=="}

This can most often occur when using multiline YAML syntax in a policy. This example includes good and bad keys:

object-templates:
  - complianceType: musthave
    objectDefinition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: mixed-example
      data:
        good.yaml: |-
          {{printf `
          line1: foo
          line2: bar
          ` | base64enc }}
        good: "{{ printf "foo: bar" | base64enc }}"
        bad.yaml: |
          {{ printf "fizz: buzz" | base64enc }}

Roles and ClusterRoles with aggregationRule

Another controller on the cluster will populate the rules list in a Role/ClusterRole based on the aggregationRule. If a policy defines both, the config-policy-controller and that other controller will constantly update the object.

Solution: Use a musthave policy, and only define the aggregationRule.

JustinKuli/#Flapping_Policies_ACM2.7_Troubleshooting.md