With an ACM mustgather of the hub, navigate into a managedcluster namespace, then run this script to identify possibly affected policies:
for file in *.yaml; do
ts_ten="$(yq e '.status.details[].history[9].lastTimestamp | select(. != null)' ${file})"
if [[ -n "${ts_ten}" ]]; then
echo "${ts_ten} - ${file}"
fi
done
Any recent timestamps (relative to when the mustgather was collected) should be investigated.
First check the status in that policy, to identify which ConfigurationPolicy is flapping (if there are multiple defined in the Policy). Then examine the template to see if it matches any of the known issues listed below.
Some of these issues are already fixed in some versions of ACM. Many of these issues affected clusters to some extent before, but were not noticed until ACM 2.7, when the flapping events became more prominent.
Incorrect fields defined in the object might be dropped quietly when the resource is created or updated. Misspellings and/or incorrect indentation are common causes. This also seems to be more common when working with custom resources.
Solution: Identify the problematic fields by comparing what is returned by the k8s API server (from kubectl get ... -o yaml
) and what is defined in the Policy. Fix the typos in the policy.
I'm not completely sure what triggers this. The policy may or may not be "flapping" between Compliant and NonCompliant, but it is constantly evaluating and sending updates (so it will be flagged by the script above in this gist), which can clutter up the history and potentially cause other issues.
Solution: If possible, create separate ConfigruationPolicies for each namespace. As of 2023-06-21, there is not a fix for this in any released version.
(This issue can occur on other kinds, but these have been most common in my experience.)
Some fields set in the policy may be omitted when returned by the k8s API server. This can easily occur with booleans set to false, or lists that are empty. For example, if the policy for a SecurityContextConstraint sets allowedFlexVolumes: []
, or the policy for a PodSecurityPolicy sets hostPID: false
.
Solution: Identify the problematic fields by comparing what is returned by the k8s API server (from kubectl get ... -o yaml
) and what is defined in the Policy. Remove fields from the policy that are not being returned by the API server.
The policy might define the Secret using stringData
, but when it checks if what it "wants" matches what is on the API server, the Secret it gets will only have data
.
Solution: Use data
, and the template function base64enc
.
For example, instead of:
stringData:
foo: bar
use:
data:
foo: '{{ printf "bar" | base64enc }}'
In more complex cases, more complex template trickery may be required. For example instead of:
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Secret
metadata:
name: my-secret
namespace: default
stringData:
file.yaml: |
foo: bar
username: {{ printf "{{hub fromSecret "policies" "source" "user" hub}}" | base64dec }}
password: {{ fromSecret "default" "source" "pass" | base64dec }}
use:
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Secret
metadata:
name: my-secret
namespace: default
data:
file.yaml: |-
{{printf `
foo: bar
username: %s
password: %s`
(printf "{{hub fromSecret "policies" "source" "user" hub}}" | base64dec)
(fromSecret "default" "source" "pass" | base64dec)
| trim | replace `
` "\n" | printf "%s\n" | base64enc}}
The example above ensures that the file will end with exactly 1 newline. The trim
step adds some "forgiveness" to the template, so additional newlines in the "main" part can be used to separate it more cleanly from the pieces that will be injected, and allows for different YAML editors to add spaces but not change the final output (I'm looking at you, ACM UI editor).
The kubernetes API will remove the trailing newline. The YAML encoding can make it tricky to see the newline, so this example uses JSON-style encoding in the important part (a nice reminder that JSON is a subset of YAML). Applying this yaml:
apiVersion: v1
kind: Secret
metadata:
name: example
data: {"foo":"ZW5jb2RlIG1lIQ==\n"}
results in this Secret being created on the cluster:
apiVersion: v1
kind: Secret
metadata:
name: example
data: {"foo":"ZW5jb2RlIG1lIQ=="}
This can most often occur when using multiline YAML syntax in a policy. This example includes good and bad keys:
object-templates:
- complianceType: musthave
objectDefinition:
apiVersion: v1
kind: Secret
metadata:
name: mixed-example
data:
good.yaml: |-
{{printf `
line1: foo
line2: bar
` | base64enc }}
good: "{{ printf "foo: bar" | base64enc }}"
bad.yaml: |
{{ printf "fizz: buzz" | base64enc }}
Another controller on the cluster will populate the rules
list in a Role/ClusterRole based on the aggregationRule
. If a policy defines both, the config-policy-controller and that other controller will constantly update the object.
Solution: Use a musthave
policy, and only define the aggregationRule
.