First published: 2021-01-30
Last update: 2024-07-15

edited with: https://stackedit.io (why is this not built-in into the .md gist editor 😵⁉)

`systemd` knowledge and debugging units

Sometimes there are non-obvious "stuff" or errors that happen when managing or creating systemd units. It is is easy to waste time trying to figure out the cause. Here is a brain dump for future me/you/us ✌😉

A pitfall I keep wasting time on is expecting all output/errors in the unit journal, but if something goes wrong before the unit starts, then output/errors may not be in the unit journal, so you need remove the unit filter and/or check /var/log/messages or perhaps /var/log/syslog depending on how your distro is configured.

Search terms that might land here

"systemd unit output and errors go to messages and not my unit journal"
"how do I debug systemd units?"
"see all output and errors for systemd unit start up"
"where does systemd log output and errors?"

Service Types

oneshot

Suitable for scripts and programs that start, run a workflow and then exit. The RemainAfterExit= is particularity useful in this case.

simple

Suitable for programs that can run as a daemon (i.e. foreground process) and don't fork to a background process.

forking

Discussed on StackOverflow [here]. The use of the forking type is discouraged per systemd docs [here].

others

Other types include exec|dbus|notify|notify-reload|idle per systemd docs [here].

Absolute paths

Remember to use absolute paths in naked unit directives, especially in the unit config, outside of interpreters like bash.

systemd supports useful Specifiers, like user home dir %h, hostname %H, and unit name %n.

Some directives support env vars, which can replace absolute paths in certain cases.

env vars and usage

Not all systemd unit directives support vars, and demand absolute path literals e.g. WorkingDirectory.

Outside of interpreters like bash, i.e. naked unit directives, when using vars ensure to use the ${VAR} style, not $VAR otherwise the var will not be expanded.

You can define unit specific env vars with the Environment unit directive. Its very useful to use this directive for env specific drop-ins. You can have a generic unit file for all envs and then use env specific drop-ins to override the main unit. e.g.

# generic unit
your-awesome.service
# env specific drop-in
your-awesome.service.d/prod.conf

Automatic `daemon-reload`

If you use the systemctl --edit command then daemon-reload is implicit and not required separately.

Create a service

systemctl edit --full --force $service

Edit a service

# daemon-reload equivalent is handled automaticlaly

systemctl edit --full $service

Edit a drop-in

Drop-ins live in the .d directory of your unit. e.g. your-awesome.service.d/override.conf.

The default override.conf drop-in can be edited with

systemctl edit $service

To edit a custom drop-in you must use your preferred $EDITOR, and ensure you run systemctl daemon-reload afterwards.

Show all runtime drop-in changes for system units

systemd-delta --type=extended

Requires vs Wants vs BindsTo

cite: https://unix.stackexchange.com/a/388881/19406

Wants= is a weaker version of Requires=. Units listed in this option will be started if the configuring unit is. However, if the listed units fail to start or cannot be added to the transaction, this has no impact on the validity of the transaction as a whole. This is the recommended way to hook start-up of one unit to the start-up of another unit.

Requires= configures requirement dependencies on other units. If this unit gets activated, the units listed here will be activated as well. If one of the other units gets deactivated or its activation fails, this unit will be deactivated. Often, it is a better choice to use Wants= instead of Requires= in order to achieve a system that is more robust when dealing with failing services.

Use the BindsTo= dependency type together with After= to ensure that a unit may never be in active state without a specific other unit also in active state.

Unit sequencing / ordering dependencies

The Before and After properties configure the ordering dependencies between units. Its well described in the docs here.

Send a notification after a unit has been started or stopped/failed

Use ExecStartPost for the start scenario, and ExecStopPost for the stop scenario.

This example uses E-Mail as the notification channel, of course you can use your own preference such as gotify or any command that can create a notification.

💡 Note that ExecStopPost handles all stop scenarios including failure. $SERVICE_RESULT gives an indication why the stop happened. Note the %% double percentages for the date formatting to escape the systemd.unit specifiers. Note that $( .. ) starts new quoting boundary scope, so the nested quotes within $( .. ) don't need special handling.

ExecStartPost=/bin/sh -c '/usr/bin/mail -s "%N ExecStartPost $(date +"%%H%%M%%Z")" [email protected] </dev/null'

ExecStopPost=/bin/sh -c '/usr/bin/mail -s "%N ExecStopPost $SERVICE_RESULT $(date +"%%H%%M%%Z")" [email protected] </dev/null'

You may wish to check the sendemail program [E.g. Debian package] for a lightweight mail submission agent (MSA) solution that supports local or relay servers. I.e. does not depend on a local mail trasfer agent (MTA) like the mail mail user agent (MUA) command. Example command:

ExecStopPost=/bin/sh -c '/usr/bin/sendemail -f %u@%H -t [email protected] -u "%N ExecStartPost $(date +"%%H%%M%%Z")" </dev/null >/dev/null'

Templates & parameterized unit names

Consider the following template example:

/lib/systemd/system/[email protected]
/lib/systemd/system/[email protected]

One can then issue systemctl enable --now [email protected]
Note the the string after the @ in referred to as the "instance name" and is used in the template files with the %i specifier.

When this command is issued, systemd references (or copies?) the template and enables and starts the unit. I personally find this quite a slick solution (where templates are a valid use case).

I've seen cases where the template is copied/constructed to /etc/systemd/system and cases where the template is only referenced. YMMV depending on your template and version of systemd.

From the systemd unit docs:

Unit names can be parameterized by a single argument called the "instance name". The unit is then constructed based on a "template file" which serves as the definition of multiple services or other units. A template unit must have a single "@" at the end of the unit name prefix (right before the type suffix). The name of the full unit is formed by inserting the instance name between "@" and the unit type suffix. In the unit file itself, the instance parameter may be referred to using "%i" and other specifiers...

System units vs. User units

Gotcha! Be aware that users with an $UID < 1000 are considered system users and systemd seems to care about this and I've seen some weird stuff with journal entries not appearing as expected for user units running on a system user. Also I'm assuming systemd will not start and/or linger a systemd --user process for system users, as their processes should belong to the system.slice, so a best practice is to setup user units only for non-system users.

User units are a very nice feature to enable a normal user to create their own units for their workloads. A nice combo I've used in the past is a service and timer unit combo to run periodic user workloads, with all the benefits of systemd like the journal, timers and process management etc. The user doesn't require any specific privileges or access to setup crons.

Creation, editing and drop-ins work exactly the same way as system services, the --user option is the key.

service=test.service
# create
# $ systemctl --user edit --force --full $service
# edit
# $ systemctl --user edit --full $service
# edit drop-in
# $ systemctl --user edit $service
 
# reload --user systemd e.g. after add/editing a custom drop in
# $ systemctl --user daemon-reload

timer=test.timer
# enable and start a timer, so it will survive reboots
# $ systemctl --user enable --now $timer
# check a timer is configued as expected
# $ systemctl --all --user list-timers

# look at the journal from the start
# $ journalctl --user-unit $service

# jump to the end of the journal
# $ journalctl -e --user-unit $service

# follow/tail the journal
# $ journalctl -f --user-unit $service

the files are created under ~user/.config/systemd/user/

Starting a user service at boot

To start at boot user service units must use WantedBy=default.target and not multi-user.target within the [Install] section.

User services that are started by a timer and not required at boot can leave out the WantedBy property.

Debugging

# conveience var to define which service you want to work with
service=your-awesome.service

# its always good to check assumptions are correct by inspecting the full unit config
systemctl show $service

# start a service, follow the unit log with -p7 (debug)
systemctl start $service; journalctl -p7 -fu $service

# start a service, start at the begenning of the journal for the unit
systemctl start $service ; journalctl -p7 -u $service

# start a service, follow the unit log with -p7 (debug), and -x extra details
systemctl start $service ; journalctl -p7 -xf -u $service

# start a service, follow the journal with -x extra details (all units)
systemctl start $service ; journalctl -p7 -xf

# sometimes errors do go into the unit log, for example if ExecStart has errors
# start a service, follow messages
systemctl start $service ; tail -f /var/log/messages /var/log/syslog

Dependencies

List dependencies of a service

systemctl list-dependencies $service --all

List dependants of a service

systemctl list-dependencies $service --all --reverse

You can use --reverse or --before or --after to modify what is listed.

cite: https://unix.stackexchange.com/a/583974/19406

Modify `systemd` default log level

It is also possible to increase the default systemd log level to debug, but this may not be that useful which is why I mention it last.

vim /etc/systemd/system.conf

# edit the LogLevel=debug, write+exit

systemctl daemon-reload

Modify `systemd` log level at runtime

Modern versions of systemd provide the following options to change log level at runtime

systemctl log-level debug

systemctl log-level info

`systemd-analyze`

TODO

systemd-analyze critical-chain
systemd-analyze critical-chain network.target
systemd-analyze critical-chain --fuzz 1h
systemd-analyze blame
systemd-analyze plot
systemd-analyze dot

References

https://www.freedesktop.org/software/systemd/man/systemd.service.html
https://www.freedesktop.org/software/systemd/man/systemd.syntax.html
https://www.freedesktop.org/software/systemd/man/systemd.unit.html
https://www.freedesktop.org/software/systemd/man/systemd.exec.html

kyle0r/README.md

systemd knowledge and debugging units