Skip to content

Instantly share code, notes, and snippets.

@reubenmiller
Last active November 27, 2023 16:33
Show Gist options
  • Save reubenmiller/ffe5ddade4267823feffce351cd88c05 to your computer and use it in GitHub Desktop.
Save reubenmiller/ffe5ddade4267823feffce351cd88c05 to your computer and use it in GitHub Desktop.
WIP: thin-edge.io workflow punchlist (firmware update)

1. restart state file is not being deleted (after image has been restarted)

After a successfully completed firmware workflow, if the tedge-agent service is restarted, then tedge-agent will reprocess the restart-current-operation file again as the file is not deleted after the workflow has been completed.

file: /data/tedge/.agent/restart-current-operation

// cat /data/tedge/.agent/restart-current-operation
{"target":"device/main//","cmd_id":"c8y-mapper-2023-11-26T12:26:39.248420821Z","payload":{"status":"executing","context":{"command":{"topic":{"name":"te/device/main///cmd/firmware_update/c8y-mapper-2023-11-26T12:26:39.248420821Z"},"status":"rollback_restart","payload":{"name":"core-image-tedge-mender-raspberrypi4-64","reason":"Nothing to commit. Either the boot loader triggered the rollback, the device was rebooted after switching to new partition, or someone did a manual rollback!","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4785008","status":"rollback_restart","url":"/var/tedge/4785008.firmware","version":"20231126122349"}},"onSuccess":"rollback_successful","onError":"failed"}}}

2. Only parse last line of standard output for state transition information

If another process writes unexpectedly to Standard Output (stdout) then this corrupts the state transition information that the tedge-agent is expecting the state script to return. Users could prevent accidental writes by using stdout redirection (>&2), however this annoying and open to user errors if the output from one command is not redirected.

Proposal

The workflow script should mark any state update information with an explicit start and end block to clearly define which parts of the process's Standard Output is used to communicate the state changes to the tedge-agent.

For example, a script can send the following output to indicate state change information:

:::begin-tedge:::
{"status":"success"}
:::end-tedge:::

These start/end blocks can be used within a shell script, and the contents between the blocks should be valid JSON (either compressed or pretty printed)

echo "This message written to stdout is not interpreted by tedge-agent as state change information"

printf ':::begin-tedge:::\n'
echo '{
  "status": "failed"
}'
printf ':::end-state:::\n'

3. Show evaluated template string of workflow script call rather then template

To make it easier to debug, and to manually run the command for debugging, then if the shell command is exampled, it is a simple copy/paste for the user.

Below shows an example what is shown to the user.

INFO tedge_agent::tedge_operation_converter::actor: Processing firmware_update operation install step with script: /etc/tedge/operations/mender_workflow.sh install --id ${.topic.cmd_id} --url ${.payload.url} --on-success commit --on-restart restart --on-error failed

4. te/errors about invalid states

Currently errors messages are published on the te/errors topic when a user is using custom workflow state names. Below shows an example of the MQTT message published on te/errors.

[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T21:50:50.731671965Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4773468","status":"install","url":"/var/tedge/4773468","version":"20231124204459"}
[te/errors] unknown variant `install`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 180

5. Workflow triggered restart operation is picked up by the mapper

A different command prefix should be used, e.g. workflow-2023-11-24T23:09:15.591533089Z (or something like that...just not c8y-...)

[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restart","url":"/var/tedge/4774439","version":"20231124230142"}
[te/errors] unknown variant `restart`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 180
[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restarting","url":"/var/tedge/4774439","version":"20231124230142"}
[te/errors] unknown variant `restarting`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 183
[te/device/main///cmd/restart/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"status":"executing","context":{"command":{"topic":{"name":"te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z"},"status":"restart","payload":{"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restart","url":"/var/tedge/4774439","version":"20231124230142"}},"onSuccess":"restarted","onError":"failed_restart"}}
[c8y/s/us] 501,c8y_Restart

6. mender can't check the size of the firmware image when trying to download via the c8y proxy - Reuben to investigate

This prevents using the url directly from Cumulocity IoT...Though the proxy is really nice because it abstracts the authentication (which is required since mender does not allow specifying credentials on the artifact url).

Check if the c8y proxy is filtering out some of this information, e.g. specific headers, or if it allows a HEAD request, because I'm fairly sure that Cumulocity IoT does actually allow checking the content length but it is hard to confirm as mender can not accept credentials in artifact url.

INFO[0004] Performing remote update from: [http://127.0.0.1:8001/c8y/inventory/binaries/4773468].
ERRO[0004] Error while installing Artifact from command line: Will not continue with unknown image size.
2023-11-24T21:11:43 [cmd=x, current=install] ERROR. Unexpected mender return code

Outcome

This looks to be a limitation of the mender client due to:

The above logic expects that there is a Content-Length header set in the response eventhough this is not supported when Cumulocity support byte range headers (thus the response is chunked). The rewritten mender client in c++ seems to handle chunked updates (https://github.com/mendersoftware/mender/pull/1501/files) however it is not available yet.

So long story short, the mender client is not compatible with the client (except with a manual patch in yocto which ignores this check.

7. Existing values of firmware info are overwritten on startup

The firmware workflow updates the values, however if the tedge-agent is restarted, then the existing value is overwritten.

[te/device/main///cmd/firmware_update] {}

Proposal

  • Currently unknown, however it needs to be discussed whether the current firmware information should be stored under a different location rather than in the /cmd/firmware topic, as this topic should indicate the function of the firmware_update and not the current version of the applied software.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment