After a successfully completed firmware workflow, if the tedge-agent service is restarted, then tedge-agent will reprocess the restart-current-operation
file again as the file is not deleted after the workflow has been completed.
file: /data/tedge/.agent/restart-current-operation
If another process writes unexpectedly to Standard Output (stdout) then this corrupts the state transition information that the tedge-agent is expecting the state script to return. Users could prevent accidental writes by using stdout redirection (>&2
), however this annoying and open to user errors if the output from one command is not redirected.
Proposal
The workflow script should mark any state update information with an explicit start and end block to clearly define which parts of the process's Standard Output is used to communicate the state changes to the tedge-agent.
For example, a script can send the following output to indicate state change information:
:::begin-tedge:::
{"status":"success"}
:::end-tedge:::
These start/end blocks can be used within a shell script, and the contents between the blocks should be valid JSON (either compressed or pretty printed)
echo "This message written to stdout is not interpreted by tedge-agent as state change information"
printf ':::begin-tedge:::\n'
echo '{
"status": "failed"
}'
printf ':::end-state:::\n'
To make it easier to debug, and to manually run the command for debugging, then if the shell command is exampled, it is a simple copy/paste for the user.
Below shows an example what is shown to the user.
INFO tedge_agent::tedge_operation_converter::actor: Processing firmware_update operation install step with script: /etc/tedge/operations/mender_workflow.sh install --id ${.topic.cmd_id} --url ${.payload.url} --on-success commit --on-restart restart --on-error failed
Currently errors messages are published on the te/errors
topic when a user is using custom workflow state names. Below shows an example of the MQTT message published on te/errors
.
[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T21:50:50.731671965Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4773468","status":"install","url":"/var/tedge/4773468","version":"20231124204459"}
[te/errors] unknown variant `install`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 180
A different command prefix should be used, e.g. workflow-2023-11-24T23:09:15.591533089Z
(or something like that...just not c8y-...
)
[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restart","url":"/var/tedge/4774439","version":"20231124230142"}
[te/errors] unknown variant `restart`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 180
[te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restarting","url":"/var/tedge/4774439","version":"20231124230142"}
[te/errors] unknown variant `restarting`, expected one of `init`, `scheduled`, `executing`, `successful`, `failed` at line 1 column 183
[te/device/main///cmd/restart/c8y-mapper-2023-11-24T23:09:15.591533089Z] {"status":"executing","context":{"command":{"topic":{"name":"te/device/main///cmd/firmware_update/c8y-mapper-2023-11-24T23:09:15.591533089Z"},"status":"restart","payload":{"name":"tedge-yocto-mender","remoteUrl":"https://t2873877.latest.stage.c8y.io/inventory/binaries/4774439","status":"restart","url":"/var/tedge/4774439","version":"20231124230142"}},"onSuccess":"restarted","onError":"failed_restart"}}
[c8y/s/us] 501,c8y_Restart
6. mender can't check the size of the firmware image when trying to download via the c8y proxy - Reuben to investigate
This prevents using the url directly from Cumulocity IoT...Though the proxy is really nice because it abstracts the authentication (which is required since mender does not allow specifying credentials on the artifact url).
Check if the c8y proxy is filtering out some of this information, e.g. specific headers, or if it allows a HEAD request, because I'm fairly sure that Cumulocity IoT does actually allow checking the content length but it is hard to confirm as mender can not accept credentials in artifact url.
INFO[0004] Performing remote update from: [http://127.0.0.1:8001/c8y/inventory/binaries/4773468].
ERRO[0004] Error while installing Artifact from command line: Will not continue with unknown image size.
2023-11-24T21:11:43 [cmd=x, current=install] ERROR. Unexpected mender return code
Outcome
This looks to be a limitation of the mender client due to:
The above logic expects that there is a Content-Length header set in the response eventhough this is not supported when Cumulocity support byte range headers (thus the response is chunked). The rewritten mender client in c++ seems to handle chunked updates (https://github.com/mendersoftware/mender/pull/1501/files) however it is not available yet.
So long story short, the mender client is not compatible with the client (except with a manual patch in yocto which ignores this check.
The firmware workflow updates the values, however if the tedge-agent is restarted, then the existing value is overwritten.
[te/device/main///cmd/firmware_update] {}
Proposal
- Currently unknown, however it needs to be discussed whether the current firmware information should be stored under a different location rather than in the
/cmd/firmware
topic, as this topic should indicate the function of the firmware_update and not the current version of the applied software.