- Investigate upgrading Metering when using the
Manual
approval strategy. - Investigate upgrading Metering when the newer, 4.5 channel is available and
Automatic
approval strategy has been configured. - Verify the
Automatic
approval strategy behavior when a 4.4 cluster is upgraded to 4.5 - Map out any scenarios where a Metering upgrade could fail
- What's the rollback process - what happens when the process of upgrading Metering to 4.5 fails, and we need to rollback to 4.4? -- Does OLM track the previous CSV version
- Flesh out what are sufficient checks to ensure a Metering installation is "healthy" - this translates to post-upgrade checks as well.
- Investigate the openshift-docs k8s custom resource syntax and conform to that standard
-- Ping Kevin L. about the syntax, e.g.
ReportDataSource
vs. ReportDataSource vs. report data source. -- Investigate Metering vs. metering vs. Metering Operator - Investigate monitoring last seen (or last timestamp modified) for post-upgrade debuggability/verification checks
- Investigate getting the MeteringConfig status field output w/o using an external tool, like
jq
. - Investigate what are good post-upgrade
k get reportdatasource
column verification checks to ensure post-upgrade success. - Investigate if there are any good, simple post-upgrade Report verification checks
-
-
Save timflannagan/64446cbc3b4c29051419981399dfa852 to your computer and use it in GitHub Desktop.
Gathering all of the events that the metering-operator has fired off, based on last timestamp:
tflannag@localhost operator-framework [] ▶ oc get events --field-selector involvedObject.kind=MeteringConfig --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
4m40s Normal Validating meteringconfig/operator-metering Validating the user-provided configuration
4m30s Normal Started meteringconfig/operator-metering Configuring storage for the metering-ansible-operator
4m26s Normal Started meteringconfig/operator-metering Configuring TLS for the metering-ansible-operator
3m58s Normal Started meteringconfig/operator-metering Configuring reporting for the metering-ansible-operator
3m53s Normal Reconciling meteringconfig/operator-metering Reconciling metering resources
3m47s Normal Reconciling meteringconfig/operator-metering Reconciling monitoring resources
3m41s Normal Reconciling meteringconfig/operator-metering Reconciling HDFS resources
3m23s Normal Reconciling meteringconfig/operator-metering Reconciling Hive resources
2m59s Normal Reconciling meteringconfig/operator-metering Reconciling Presto resources
2m35s Normal Reconciling meteringconfig/operator-metering Reconciling reporting-operator resources
2m14s Normal Reconciling meteringconfig/operator-metering Reconciling reporting resources
tflannag@localhost operator-framework [] ▶
Still not entirely sure how to get the MeteringConfig CR Status field without using an external tool, like jq
:
tflannag@localhost operator-framework [] ▶ k get meteringconfig -o jsonpath="{.items[*].status}"
map[conditions:[map[lastTransitionTime:2020-05-29T15:19:40.734311Z message:Awaiting the next reconciliation status:False type:Running]]]tflannag@localhost operator-framework [] ▶
In comparison to using jq
instead of the build-in jsonpath
output option:
tflannag@localhost operator-framework [] ▶ k get meteringconfig operator-metering -o json | jq '.status'
{
"conditions": [
{
"lastTransitionTime": "2020-05-29T15:19:40.734311Z",
"message": "Awaiting the next reconciliation",
"status": "False",
"type": "Running"
}
]
}
tflannag@localhost operator-framework [] ▶
We could instead do something like this:
tflannag@localhost operator-framework [] ▶ oc get meteringconfig operator-metering -o=jsonpath='{.status.conditions[?(@.type=="Invalid")].message}'
"Invalid configuration for non-OKD distributions: You must set the reporting-operator.spec.config.prometheus.url."
tflannag@localhost operator-framework [] ▶
Essentially, while we wait for Metering to roll out, watch for changes to the MeteringConfig custom resource to ensure that no error has been encountered in the Ansible role.
From the OpenShift documentation guidelines:
An Operator’s full name must be a proper noun, with each word initially capitalized. If it includes a product name, defer the product’s capitalization style guidelines.
- https://github.com/openshift/openshift-docs/blob/master/contributing_to_docs/doc_guidelines.adoc#operator-name-capitalization
- From Kevin, "metering" is lowercase. If you are referring to the "Metering Operator," it is initially capitalized.
Do not provide examples which use jq. Examples should use a templating engine that is provided with oc, like jsonpath. See (https://bugzilla.redhat.com/show_bug.cgi?id=1764726#c6) for more information.
Updated Timeline:
- Release Notes - Call for Last Review - 6/29
- Localization Doc Freeze - 7/2
- Doc Freeze - 7/2
- Release Notes - Final Deadline - 7/2
- 4.5 GA - 7/9
A good debug check for metering installations
For debugging a failed Metering database (creation):
oc -n openshift-metering get storagelocations -o json | jq '.status'
If that status field is empty, we can make the inference that reporting-operator cannot properly communicate with Hive, or there's an issue with Hive server and Hive metastore.
Note: not a great check for upgrades as the reporting-operator is not going to re-process this resource if the status field is non-empty.
Lindsey: something that would be useful to actually go through the upgrade process and pull out more concrete data that we can include in the "Procedure" in the Metering upgrade, in the context of the OCP console.
Other notes:
- Figure out a decent Report that ensures that Metering is still functioning as intended, after the upgrade process.
- Are there any further measures we need to document past creating a Report, there's new data in that report, you can view the report data?
- It's somewhat of a poor user experience having to switch from console view, to CLI, to back to console, etc. Is there a way to alleviate this, e.g. we only upgrade Metering through the CLI, or console, for consistency sake.
- Sync up again later next week, try to get closer to a final draft, such that Peter (pruan) can start reviewing.
- Try to push for the week of the 15th to try and wrap this up - check-in with group lead (bparees) if they also need to review content.
- I need to start working towards release notes.
Since the last sync:
- Added quick write-up for release-4.5 notes: https://gist.github.com/timflannagan1/9cd998945a2521b2bcbd1db86904322e
- Lindsey added documentation and examples around tracking events. Dial down on the language used and whether or not it's too generalistic.
- Investigate documenting the
MeteringConfig
status. This is useful for tracking down errors that may have occurred in the Ansible role while rolling out the Metering stack. - Another open question is what's the most consumable way to track the progress of the Metering operator stack. Traditionally, this would be via events, but we need to double-check what's the more reliable implementation under the hood.
- Figure out a decent Report that ensures that Metering is still functioning as intended, after the upgrade process.
Timelines
Doc freeze: 06/04
Code freeze: 05/29
GA release: 06/18
Note: there may be wiggle room if we merge something concrete and we need to do minor follow-ups.
Next steps
ReportDataSource
..., etc.Things to keep in mind
References