Created
March 1, 2019 18:02
-
-
Save myshkin-uk/f1a5a0f081ada2fe38df64029f13d641 to your computer and use it in GitHub Desktop.
Wind up and point all the optional safety checks which bash offers for scripts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Usage - include this script before doing serious work in another script. | |
you will need to leave your script using do_exit, not plain old exit - because bash double-uses exit as a way of flagging faults. | |
# No shebang because we are included in another sh file | |
# Files which include this one MUST use bash, since we use the 'source' command below. | |
# @FIXME DGC 16-Aug-2016 | |
# I'm not clear why this script doesn't use a guard variable as per the coding standard. | |
# @file script-start.inc.sh | |
# | |
# @brief This file should be invoked at start of almost all script files in our project | |
# to trap and log script errors so we can find what the nature of the collapse was. | |
# | |
# Note that you will save a lot o f grief if you get your script to pass the tests here first: | |
# https://www.shellcheck.net/ | |
# it's totally tedious, and totally worth the effort. | |
# | |
# This script should be usable anywhere. | |
# However if it can find the files: | |
# /usr/bin/NetrixFileNames.inc.sh | |
# $configIncAPAFN which will normally be /etc/dexdyne/config/webcontrol.conf | |
# it includes their source. | |
# The 'file names' file can provide $consoleLogsAPADN which will then be used below in crash reporting. | |
# | |
# This script: | |
# Can't run if you already have an EXIT trap active, because this script needs to set that for itself. | |
# Sets the flag so that 'failure' ( non-zero return codes ) | |
# of intermediate stages in a pipeline abort the pipeline. | |
# That isn't standard bash behaviour, and we may need to fix existing code | |
# which expects to get away with such things. | |
# | |
# @param notrap If passed "notrap", the file will run as normal, but without setting -e and traps. | |
# this may ( or may not ) be useful in debugging the exact reasons for a failure. | |
# | |
# @return No return code. | |
# Some unrecoverable errors will exit the calling script. | |
# They should write a failure message to the event log. | |
# | |
# @copyright Copyright Dexdyne Ltd. 2013-2019. All Rights Reserved. | |
# | |
# @author DGC | |
# This script sets up many "do something on trap" mechanisms, which write to the event log file | |
# so at least we'll have a record that a script died when we didn't expect it to. | |
# It also makes the script run with "stop on any error", | |
# which is a tough criterion to meet - some commands can return exit codes | |
# which look for all the world like error flags, | |
# beware of the sleep command, for instance | |
# and for stuff you wouldn't want to abort everything on. | |
# | |
# see: | |
# http://www.davidpashley.com/articles/writing-robust-shell-scripts.html | |
# | |
# See more detail near bottom of file on how we get round this using sub-shells. | |
# Actually we now tend to encapsulate things with " if xxx.sh ; then : ;fi ", which should also fix the issue. | |
# | |
# The shell interpreter in busybox is basically undocumented. | |
# The developer says "if it doesn't behave in a posix-compatible way, | |
# then complain and it might get fixed". | |
# | |
# There is a busybox forum where you can complain :-( | |
# | |
# a sort of manual for the ash shell can be found at | |
# http://www.digipedia.pl/man/doc/view/ash.1 | |
# see also | |
# http://linux.die.net/man/1/ash | |
# and | |
# http://www.manpages.com/man/sh-linux-man-page/ | |
# usage | |
# silentIfTerminatedBySigterm=true | |
# or silentIfInterruptedBySigint=true | |
# or silentIfTerminatedOrIfInt=true | |
# source /sbin/script-start.inc | |
# | |
# Set the quiet parameters to stop THIS script putting anything about receiving SIGINT and/or SIGTERM in the error log. | |
# They may be omitted and the behaviour is then as if you had set them false. | |
# | |
# It is VERY difficult to understand how bash works in this area, | |
# and even more to work out what parts of it the ash interpreter in busybox implements. | |
# | |
# I guess one could try reading the busybox source code, but good luck on that. | |
# | |
# This script is intended to work on bash and surprising amounts of it used to work on busybox on other models. | |
# It has not at present been tried with dash. | |
# | |
# It is worth defining some terms: | |
# primary script --- a top-level script, which DOES NOT INCLUDE THIS FILE, usually run from start-up scripts or an exe file. | |
# secondary script --- a script which DOES NOT INCLUDE THIS FILE, run by a primary script, or another secondary script | |
# service script --- a script run by a primary or secondary script which DOES NOT INCLUDE THIS FILE | |
# | |
# We think that in a quiet system, with no attached consoles, we can expect to see other processes | |
# request our primary script to die by issuing SIGTERM. As far as we can see, that will only be received | |
# by the primary script itself, between commands in it | |
# - it will not be passed to any commands or secondary/service scripts | |
# we may happen to be executing at the moment the signal is raised. | |
# | |
# So if we are hung up in a script or command we've called, we will pretty much ignore a SIGTERM sent to us. | |
# That certainly applies to "sleep n" commands - they are non-interruptible and hold off the processing of SIGTERM. | |
# | |
# If we receive an interrupt from an attached console, however, it appears that will go to the process | |
# which is live at that very moment - in other words it MAY be a secondary script. | |
# That script will then quit, with a simple error code, disguising the ACTUAL signal, | |
# ( EVEN IF IT INCLUDED ITS OWN COPY OF THIS FILE ) | |
# and we will report it as an error in THIS script, just as if a command we called went wrong. | |
# | |
# There's no point in making too much of this - it will cause double-reporting, not under-reporting. | |
# HOWEVER - it makes sense if scripts that expect, in the architecture we have designed, to be sent a SIGTERM | |
# do not then report that to the error log. | |
# We try to achieve this with the silentIfTerminatedBySigterm and silentIfInterruptedBySigint variables. | |
# This file uses the convention that functions and variables to be used ONLY within this file | |
# have names ending in double-underline. | |
# Full bash supports ${parameter=default} and ${parameter:=default} to force undefined parameters | |
# but it looks like ash / busybox doesn't. | |
# We can get around that easily enough | |
# | |
# we can use | |
# ${1-} meaning - tolerate missing first parameter - treat it as "" | |
# ${1-fred} meaning - tolerate missing first parameter - treat it as "fred" | |
# | |
# The next few lines ensure all 3 variables are set to true or false | |
# so we have no issues later once we start trapping unset variables... | |
# At this moment undefined variable trapping must be OFF. | |
# You can remove this to try going back to ash/dash/busybox. | |
. /usr/bin/ensure-shell-is-bash.inc.sh | |
# Error trapping is OFF at present. | |
doTrapErrors=true | |
if [ "${1-}" = "notrap" ] | |
then | |
doTrapErrors=false | |
fi | |
{ | |
if [ "true" != "${silentIfTerminatedBySigterm-false}" ] | |
then | |
silentIfTerminatedBySigterm="false" | |
fi | |
if [ "true" != "${silentIfInterruptedBySigint-false}" ] | |
then | |
silentIfInterruptedBySigint="false" | |
fi | |
if [ "true" = "${silentIfTerminatedOrIfInt-false}" ] | |
then | |
silentIfTerminatedBySigterm="true" | |
silentIfInterruptedBySigint="true" | |
fi | |
} | |
# @FIXME DGC 26-Oct-2017 | |
# This script should really be independent of running on a "Netrix box". | |
# so the following few lines import Netrix files as-and-when they are available. | |
# shellcheck disable=SC1091 | |
{ | |
# You can't create a function "sourceIfAvailable()" because | |
# if any of the code sourced tries to create a 'global variable' it will only succeed | |
# in creating a temporary one local to the function. | |
# So we do the equivalent with an "executable string", which runs in the current context. | |
sourceIfAvailable='if [ -r "${target}" ] ; then source "${target}" ; fi' | |
target="/usr/bin/NetrixFileNames.inc.sh" ; eval ${sourceIfAvailable} | |
# Force the standard command path. | |
# There is no code in our system that uses a special path, so this is reasonable. | |
target="/sbin/set-std-path.inc.sh" ; eval ${sourceIfAvailable} | |
# We may need to alter the behaviour below based on the model we are running on | |
# as at this moment they use different issues of busybox. | |
# | |
# Currently the only place that can tell us that is webcontrol.conf | |
# though an alternative mechanism with a dedicated file ( say in / ) would be better. | |
# | |
# Import Control Centre configuration file | |
target="${configIncAPAFN}" ; eval ${sourceIfAvailable} | |
} | |
# set true/false to echo entry/exit/abort messages on scripts | |
echoOnEntryAndExit__=false | |
# Output to stdout the tails of all files matching a wildcard | |
# | |
# @param fileNamePattern Wildcard to match. | |
# @param lines Number of lines per file to output. | |
# @param logWidth Maximum number of character columns to print. | |
# | |
listFileTails() | |
{ | |
local fileNamePattern=$1 | |
local lines=$2 | |
local logWidth=$3 | |
local fileName | |
# These log files may not exist - which would cause script abort when running with set -e | |
# if we failed to test. | |
# | |
# shellcheck disable=SC2086 | |
if ls ${fileNamePattern} >/dev/null 2>&1 | |
then | |
# | |
# Stepping through the output of ls is normally deprecated if there is any chance of spaces | |
# or other 'interesting' characters in the file names. | |
# We have no such issues with our log files, on which this function is used. | |
# | |
for fileName in ls ${fileNamePattern} | |
do | |
echo "- - - - - - - ${fileName} - - - - - -" | |
tail -n"${lines}" "${fileName}" | cut -c1-"${logWidth}" | |
done | |
fi | |
} | |
# Appends a pile of 'useful' information about the state of the machine to a file. | |
# | |
# @param filename to be written to. | |
# | |
outputDebugInfoToFile() | |
{ | |
local outputOPAFN=$1 | |
# FIXME DGC 17-1-2013 | |
# ought to defend ourselves against $1 being blank | |
# maybe even check the path part exists | |
# | |
# Not a serious issue, but it's slack programming. | |
# | |
{ | |
# You can get the ACTUAL console width under bash. | |
local logWidth; logWidth=130 # enough to catch useful stuff, but excludes long long lines from webcore | |
# FIXME DGC 17-1-20913 | |
# | |
# adding file handle counts here would be good | |
echo "-------------------------- date -----------------------------" | |
date | |
echo "------------------------- ps ax -----------------------------" | |
# FIXME DGC 3-4-2012 | |
# nasty shortcut here - we happen to know that busybox ignores a "ax" parameter | |
# even though that's not documented behaviour. | |
# but it is required on big linux ps to get all users. | |
# | |
# show full output of ps, not simplified ps.sh columns, - in case the rest is useful | |
# | |
ps ax | cut -c1-${logWidth} | |
echo "------------------------- whoami ----------------------------" | |
whoami | |
echo "--------------------------- df ------------------------------" | |
# @FIXME DGC 16-Mar-2016 | |
# See big FIXME in routine 'getRootFileSystemName()' | |
# In the NetrixShellMacros.inc.sh file. | |
# | |
df 2>/dev/null | |
echo "--------------------- cat /proc/meminfo ---------------------" | |
cat /proc/meminfo | |
echo "------------------------ route -n ---------------------------" | |
/sbin/route -n | |
echo "------------------------ ifconfig ---------------------------" | |
/sbin/ifconfig | |
echo "------------------- cat /etc/resolv.conf --------------------" | |
cat /etc/resolv.conf | |
echo "--------------------- Tail of dmesg. ------------------------" | |
dmesg | grep -v termios | tail -n 30 | cut -c1-${logWidth} | |
echo | |
echo "-- lines in process logs possibly reporting a script error --" | |
echo | |
if [ -n "${consoleLogsAPADN}" ] | |
then | |
# @FIXME DGC 27-Feb-2017 | |
# If any of the .log files are not readable by the current user, this emits an error. | |
# It may be that we need to do a 'find' for files we have the right to read before we grep. | |
# Obviously will all work fine when we are root, which we usually are. | |
# | |
find "${consoleLogsAPADN}" -mount -mindepth 2 -maxdepth 2 -readable -name "*.log" \ | |
-exec grep -e ": line" {} \; | |
echo | |
echo "---------------- Tails of all process logs. -----------------" | |
# shellcheck disable=SC2016 | |
{ | |
listFileTails '${consoleLogsAPADN}/*/current' 10 ${logWidth} | |
listFileTails '${consoleLogsAPADN}/ppp/ip-*.txt' 20 ${logWidth} | |
listFileTails '${consoleLogsAPADN}/openvpn/vpn*.txt' 20 ${logWidth} | |
} | |
fi | |
if [ -n "${pppdScriptsLogAPADN}" ] | |
then | |
# @FIXME DGC 27-Feb-2017 | |
# If any of the .log files are not readable by the current user, this emits an error. | |
# It may be that we need to do a 'find' for files we have the right to read before we grep. | |
# Obviously will all work fine when we are root, which we usually are. | |
# | |
echo "---------------- Tail of all ppd scripts log. -----------------" | |
# shellcheck disable=SC2016 | |
{ | |
listFileTails '${pppdScriptsLogAPADN}/auth-*.txt' 20 ${logWidth} | |
} | |
fi | |
} >> "${outputOPAFN}" | |
sync | |
} | |
# Set these only to 'true' or 'false'. | |
stopOnUnexpectedSignals=true | |
logUnexpectedSignals=true | |
# Trap routine for most signals/event in scripts. | |
# | |
# @param Dexdyne-defined text corresponding to the name of the signal. | |
# @param The script in which the issue occurred. | |
# @param The line number at which the issue occurred. | |
# @param Use true/false to decide if the trap routine should force a script exit. | |
# @param If we do force an exit, this is the script exit code. | |
# | |
# If this routine does not explicitly 'exit' then on return the script will | |
# 'go on doing what it was going to do' | |
# For some signals that won't be much as most are disregarded anyway, so the script will | |
# just keep on executing. | |
# For signals like int/term, the signal will have the effect it would have had if you hadn't trapped it. | |
# | |
unexpectedTrap__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local signalName="$1" | |
local faultFile="$2" | |
local faultLineNumber="$3" | |
local doExit="$4" | |
local exitCode="$5" | |
# | |
# Bash apparently provides the $LINENO variable, | |
# but other interpreters ( like the busybox ash we expect to be using on the AVR32/N7000 ) may not. | |
# | |
echo "Unexpected ${signalName} received at line ${faultLineNumber:-(unknown)} in script ${faultFile}" | |
if [ "${signalName}" != "SIGWINCH" ] # The user changing the size of the window we are printing into is NOT a serious issue | |
# and we just ignore it. Wont see it in an embedded situation anyway. | |
then | |
filename="/reboots/last-unexpected-trap-in-script" | |
echo "Unexpected ${signalName} received at line ${faultLineNumber:-(unknown)} in script ${faultFile}" > ${filename} | |
outputDebugInfoToFile ${filename} | |
# Allow for the option not to log unexpected signals to error log. | |
if ${logUnexpectedSignals} | |
then | |
# it may or may not be reasonable to try to append to the event log | |
# - depending on the exact disaster we have suffered. | |
# but may as well give it a try | |
${appendEventLogExeAPAFN} "unexpected ${signalName} received in script ${faultFile}" | |
fi | |
# Allow for the option to treat script errors as soft. | |
if ${stopOnUnexpectedSignals} | |
then | |
# Exclude signals which we have seen and don't seem serious. | |
# | |
# Assume we always get the SIGxxx format here, whether the user invoked it with or without the "SIG" part. | |
# | |
if ${echoOnEntryAndExit__} | |
then | |
echo "Aborting script ${faultFile} because of unexpected trap," | |
fi | |
# Just in case it all goes wrong, and we don't exit | |
# set this global flag that can be tested by the script we're servicing. | |
G_exitWanted=1 | |
# This exits the offending script that included us!!! | |
# No idea if this is always the correct action - but we can't afford to loop adding to the event log. | |
# | |
do_exit 99 "${faultFile}" "${faultLineNumber}" | |
fi | |
fi | |
if ${doExit} | |
then | |
exit "${exitCode}" | |
fi | |
} | |
# Trap routine script errors, | |
# which will never be executed if the set -e flag is active, as that turns errors into exits. | |
# | |
# We do what we can by making a permanent record of the last few lines of the console log | |
# of every running process which is logging to ${consoleLogsAPADN} | |
# one of those should contain the info we need to see what happened | |
# | |
unexpectedError__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local faultFile="$1" | |
local faultLineNumber=$2 | |
# @FIXME DGC 26-Oct-2017 | |
# This gives incorrect information. | |
# The line number is correct, but if a file is included in another | |
# then $faultFile refers to the including, not the included, file. | |
echo "Error was trapped at line ${faultLineNumber} in script ${faultFile}" | |
filename="/reboots/last-script-error-or-bare-exit" | |
echo "Error was trapped at line ${faultLineNumber} in script ${faultFile}" > ${filename} | |
outputDebugInfoToFile ${filename} | |
# Allow for the option to treat script errors as soft. | |
if true | |
then | |
# It may or may not be reasonable to try to append to the event log | |
# - depending on the exact disaster we have suffered. | |
# but may as well give it a try | |
# @FIXME DGC 27-Feb-2017 | |
# At present only the root user can write the event log. | |
# That is actually a bug I think. | |
# The following lines will all echo error messages to stdout, which I assume is harmless | |
# - if you had to avoid them you could test 'whoami' but that risks | |
# silliness if we fix the underlying bug. | |
# | |
${appendEventLogExeAPAFN} "Script error noted - request Dexdyne support to examine logs." | |
${appendEventLogExeAPAFN} "\$1: Error was trapped at line ${faultLineNumber} in script ${faultFile}" | |
${appendEventLogExeAPAFN} "\$1: Debug info can be found in file ${filename}" | |
fi | |
# We could just log the problem and continue the script. | |
# At present we choose to force the script to exit. | |
do_exit 98 | |
} | |
# Trap routine for exit from script, | |
# which also catches reading uninitialised variables when the "set -u" flag is active. | |
# It would catch all errors if we set "set -e", but we currently don't. | |
# | |
# If you trap EXIT, all that happens is that you execute these commands on the way out | |
# - you can't/don't avoid doing the exit. | |
# | |
# If you run another exit, that over-rides the return code, | |
# otherwise the code returned is the one that originally brought us here, | |
# so if a command returns 47, and thereby causes a trap under "set -e" | |
# we will return 47 from this script, even though we run commands here, | |
# unless we make an effort not to. | |
# | |
# It appears we can't access the rc or "exit code" | |
# of the last command or error before we came here - shame. | |
# So though we will return it - we don't know what it is. | |
# | |
# Nor can we, under ash, read the line number $LINENO in which an error occurred. | |
# | |
# We do what we can by making a permanent record of the last few lines of the console log | |
# of every running process which is logging to ${consoleLogsAPADN} | |
# one of those should contain the info we need to see what happened | |
# | |
unexpectedExit__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local faultFile="$1" | |
local faultLineNumber=$2 | |
# @FIXME DGC 28-Feb-2017 | |
# Line number always seems to be 1 for any exit, though it works for errors. | |
# If we become convinced of that, stop looking at it. | |
if ${errorTrapDefused__} | |
then | |
return | |
fi | |
echo "Bare exit ( or uninitialised variable access ) was trapped in script ${faultFile}." | |
runExitScript_niu | |
filename="/reboots/last-script-error-or-bare-exit" | |
echo "Bare exit ( or uninitialised variable access ) encountered at line ${faultLineNumber} in script ${faultFile}" > ${filename} | |
outputDebugInfoToFile ${filename} | |
# Allow for the option to treat script errors as soft. | |
if true | |
then | |
# It may or may not be reasonable to try to append to the event log | |
# - depending on the exact disaster we have suffered | |
# but may as well give it a try | |
# @FIXME DGC 27-Feb-2017 | |
# At present only the root user can write the event log. | |
# That is actually a bug I think. | |
# The following lines will all echo error messages to stdout, which I assume is harmless | |
# - if you had to avoid them you could test 'whoami' but that risks | |
# silliness if we fix the underlying bug. | |
# | |
${appendEventLogExeAPAFN} "Script exit/uninitialised var noted - request Dexdyne support to examine logs." | |
${appendEventLogExeAPAFN} "\$1: Bare exit encountered, or uninitialised variable read, at line ${faultLineNumber} in script ${faultFile}" | |
${appendEventLogExeAPAFN} "\$1: Debug info can be found in file ${filename}" | |
if ${echoOnEntryAndExit__} | |
then | |
echo "Quitting script ${faultFile}" | |
fi | |
fi | |
# If we reach this point then the trap will continue on and exit the script with the original exit code. | |
# @FIXME DGC 28-Feb-2017 | |
# If what we encountered was a script 'running-off-the-end' or just executing 'exit' | |
# then the exit code is zero.... which indicates success. | |
# Unfortunately I think we've looked into this before, and we cannot access that code, | |
# we can only quit, and then it will become apparent what it was. | |
# So we have a choice here: | |
# - we can quietly quit, and return the original code,including zero. | |
# - we can force an exit with a code of 97 or something. | |
# Currently we choose the latter - as it at least makes sure the caller sees a failure. | |
exit 97 | |
} | |
# Function executed asynchronously on receipt of a SIGINT signal | |
# | |
# This is what we receive if someone types CTRL-C on a terminal connected to the process. | |
# | |
# In Netrix this doesn't happen at run-time, so my interest in doing the following perfectly is limited. | |
# | |
# See notes at the top about the fact that while THIS process may receive | |
# and handle a SIGINT - the process to which we return may see the issue as a | |
# trapped error ( because of our exit code ) - not as a SIGINT. | |
# | |
# IT IS POSSIBLE that we should react to this signal by sending an equivalent signal to our parent process. | |
# However I see descriptions of something called a "process group", | |
# which appear to be broadcast signals like SIGTERM so they all get it - more investigation needed | |
# if the issue becomes significant. | |
# | |
receivedSigint__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local faultFile="$1" | |
local faultLineNumber=$2 | |
runExitScript_niu | |
if ! ${silentIfInterruptedBySigint} | |
then | |
# It may or may not be reasonable to try to append to the event log | |
# - depending on the exact disaster we have suffered. | |
# but may as well give it a try. | |
# The trigger is external, so I assume $faultLineNumber is not significant. | |
${appendEventLogExeAPAFN} "\$1: unexpected SIGINT received in script ${faultFile} - terminating" | |
if ${echoOnEntryAndExit__} | |
then | |
echo "Terminating script ${faultFile} because of unexpected SIGINT." | |
fi | |
fi | |
# I believe that trapping this signal stops the otherwise default action which would terminate the script. | |
# We would like to go ahead and exit, so we have to do it for ourselves with an exit statement. | |
# Just in case it all goes wrong, and we don't exit | |
# set this global flag that can be tested by the script we're servicing. | |
G_exitWanted=1 | |
# We can quietly quit, and return the original code, | |
do_exit 99 "${faultFile}" "${faultLineNumber}" # Exits the including script. | |
} | |
# Function executed asynchronously on receipt of a SIGTERM signal | |
# | |
# This is what we receive if someone/something simply says "kill 123" | |
# | |
# Netrix DOES use this to stop "watcher and shepherd scripts" so we must do it properly. | |
# In particular we shouldn't moan in the error log about something we expected to happen. | |
# | |
# Currently no Netrix scripts trap SIGTERM for any practical purpose ( like releasing lock files or similar ) | |
# so we don't have to consider a script which over-rides this trap. | |
# | |
receivedSigterm__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local faultFile="$1" | |
local faultLineNumber=$2 | |
runExitScript_niu | |
if ! ${silentIfTerminatedBySigterm} | |
then | |
# It may or may not be reasonable to try to append to the event log | |
# - depending on the exact disaster we have suffered. | |
# but may as well give it a try. | |
# @FIXME DGC 7-8-2013 | |
# This is not the appropriate test on the 8000 box. | |
# We would need to check for the use of 'service [netrix | comms | networking} stop'. | |
# @FIXME DGC 27-Feb-2017 | |
# The very helpful 'shellcheck' suggests | |
# pgrep -f "K06netrix.sh" | |
# as a better alternative to grepping the output of ps aux | |
# | |
# @FIXME DGC 1-Feb-2018 | |
# I assume this logic all goes to hell in a handcart on systemd machines. | |
# shellcheck disable=SC2009 | |
if ps aux | grep "K..netrix" # crude synonym for "box is shutting down - should catch K06netrix and K??netrixshutdown" | |
then | |
: # If unit is closing down we don't take any notice of SIGTERM reports | |
# the scripts OUGHT to know how to die without reporting an error, but no reason to blather in the event log. | |
else | |
# the trigger is external, so I assume $faultLineNumber is not significant | |
${appendEventLogExeAPAFN} "\$1: unexpected SIGTERM received in script ${faultFile} - terminating" | |
if ${echoOnEntryAndExit__} | |
then | |
echo "Terminating script ${faultFile} because of unexpected SIGTERM." | |
fi | |
fi | |
fi | |
# I believe that trapping this signal stops the otherwise default action which would terminate the script. | |
# We would like to go ahead and exit, so we have to do it for ourselves with an exit statement. | |
# Just in case it all goes wrong, and we don't exit | |
# set this global flag that can be tested by the script we're servicing. | |
# shellcheck disable=SC2034 | |
G_exitWanted=1 | |
# This exits the including script. | |
do_exit 99 "${faultFile}" "${faultLineNumber}" | |
} | |
# If a script receives SIGPIPE, it's expected to shut down | |
# | |
# We honestly don't know whether having trapped this we could choose to soldier on, | |
# or whether this is just a temporary diversion, and we will exit the script when the trap exits. | |
# | |
# Don't think any netrix code pipes output into a script at present, | |
# ( though I think we once used it as a substitute for svlogd when it wasn't available ) | |
# so we will probably never encounter this. | |
# | |
receivedSigpipe__() # Routine name is deliberately non-conformant, so it should not clash with any user routine. | |
{ | |
local faultFile="$1" | |
local faultLineNumber="$2" | |
# Next line is for debug only. | |
echo "SIGPIPE received in script ${faultFile}" | |
runExitScript_niu | |
# The trigger is external, so assume $faultLineNumber contains no useful information. | |
${appendEventLogExeAPAFN} "\$1: SIGPIPE received in script ${faultFile} - process our output was being piped to died on us." | |
if ${echoOnEntryAndExit__} | |
then | |
echo "Terminating script ${faultFile} because of SIGPIPE." | |
fi | |
# Just in case it all goes wrong, and we don't exit | |
# set this global flag that can be tested by the script we're servicing. | |
# shellcheck disable=SC2034 | |
G_exitWanted=1 | |
# This exits the including script. | |
do_exit 99 "${faultFile}" "${faultLineNumber}" | |
} | |
runExitScript_niu() | |
{ | |
# Use of this is suspended until we find it something to do that we can't do inline here! | |
return 0 | |
} | |
# Function which should be called by scripts that have included this file, in order to exit. | |
# | |
# We cannot distinguish between an error trapped by "set -e" ( or set_u ??? ) and an exit statement, | |
# so we forbid the use of simple 'exit' in scripts working with us. | |
# | |
# We provide this function as a way for them to exit | |
# | |
# @param [opt] Desired exit code - defaults to zero. | |
# | |
do_exit() | |
{ | |
local rc="${1-0}" # The exit code we ought to return from the script. | |
# echo "Running do_exit ${rc} in script ${faultFile}" | |
# At one time we used to turn exit trapping off, but that's not compatible with stacked handling. | |
# So all we do now is to tell our trap to do nothing. | |
errorTrapDefused__=true | |
# we have to trust the rest of this function, to avoid recursing if there's a further error. | |
allowUninitialisedVariables | |
runExitScript_niu | |
# Next line is totally unnecessary - all traps are nullified by exiting the script. | |
trap - SIGINT SIGTERM | |
# Leave the exit trap running, though now the trap routine will do nothing. | |
if ${echoOnEntryAndExit__} | |
then | |
echo "do_exit is exiting script ${faultFile} with return code $rc." | |
fi | |
# Just in case it all goes wrong, and we don't exit | |
# set this global flag that can be tested by the script we're servicing. | |
# shellcheck disable=SC2034 | |
G_exitWanted=1 | |
exit "${rc}" # Exit THE INCLUDING SCRIPT, via whatever exit trap(s) are in place. | |
} | |
allowErrors() | |
{ | |
set +e | |
} | |
allowUninitialisedVariables() | |
{ | |
set +u | |
} | |
# Find the active trap for a signal, if there is one. | |
# | |
# @param The name of something which can be trapped by the bash 'trap' command. | |
# In the short format, not including a leading 'SIG' | |
# EXIT is assumed if the parameter is missing. | |
# | |
# @return Success if the signal is being trapped. | |
# On success: | |
# The global variable "$G_reinstateActiveTrapCommand" is set to a command which can be used to reinstate the current trap. | |
# The global variable "$G_activeTrapRoutine" is set to a the name of the current trap routine. | |
# | |
# If no trap is active, they will return an empty string, which is also a legal bash command. | |
# | |
findActiveTrapFor() | |
{ | |
trapName="${1-EXIT}" # Setting a default is easier than providing a validity test. | |
# There are 4 'special' traps in addition to the standard signals. | |
# The output of 'kill -l' ( which mirrors 'trap -l' ) includes the SIG prefix. | |
# | |
if [ "EXIT" != "${trapName}" ] \ | |
&& [ "DEBUG" != "${trapName}" ] \ | |
&& [ "RETURN" != "${trapName}" ] \ | |
&& [ "ERR" != "${trapName}" ] \ | |
&& ! kill -l | grep -q "SIG${trapName}" | |
then | |
echo "Routine was asked for active trap for signal '${trapName}'," | |
echo " but that isn't a special KILL / DEBUG / RETURN / ERR token and" | |
echo " doesn't appear in the output of 'kill -l'" | |
do_exit 1 | |
fi | |
# The output from 'trap -p' is in the format | |
# trap -- 'exitRoutine' EXIT | |
# which is usable as a command to reinstate the trap in question. | |
G_reinstateActiveTrapCommand="$(trap -p | grep "${trapName}")" | |
G_activeTrapRoutine="" | |
if [ -n "${G_reinstateActiveTrapCommand}" ] | |
then | |
# @FIXME DGC 10-Apr-2017 | |
# There is some slicker code to do this using parameter extraction rather than 'cut', | |
# in shell-macros.inc.sh. | |
G_activeTrapRoutine=$(echo ${G_reinstateActiveTrapCommand} | cut -d' ' -f3 | cut -d"'" -f2) | |
fi | |
[ -n "${G_reinstateActiveTrapCommand}" ] | |
} | |
#################### end of function definitions - execution starts here #################################### | |
# shellcheck disable=SC2034 | |
G_exitWanted=0 | |
if ${doTrapErrors} && findActiveTrapFor EXIT | |
then | |
${appendEventLogExeAPAFN} "script-start.inc.sh found that a trap on EXIT was previously set. Can't work with that." | |
# Consider a stack-dump here? | |
exit 1 # Quit the whole script which included us. | |
fi | |
# This script has to be trusted - do this for clarity, though probably already the case. | |
allowErrors | |
allowUninitialisedVariables | |
# If this variable is set, we are running for the second time in the same script file - we shouldn't. | |
if [ -n "${haveRunScriptStart__}" ] | |
then | |
echo "****************************** script-start.inc.sh used repeatedly - that's an error. *********************************" | |
fi | |
haveRunScriptStart__="true" | |
if ${echoOnEntryAndExit__} | |
then | |
# This would cause an 'uninitialised variable' error if that situation was trapped already. | |
# Bash now allows included scripts to be given parameters on the same line. | |
# I think the following line was intended to print the positional parameters of the including script. | |
# and I'm not sure who that interacts with the new capability. | |
# It's also worth noting that for this to work we must include this script | |
# BEFORE decoding the script positional parameters, since we now use 'shift' on each one | |
# which would make them unavailable afterwards. | |
# | |
# @FIXME DGC 1-Feb-2018 | |
# UM - should this be using "$@" or similar? | |
# | |
echo "Starting script $0 $1 $2 $3 $4 $5 $6 $7 $8" | |
fi | |
# This setting forces the return code of a pipeline to be 'fail' if any of the commands in the pipe fail. | |
# I have tested that this does what it says - without it the script does not notice a non-zero return code | |
# from an intermediate stage. | |
# Despite that simple testing this remains new-and-experimental! | |
# We may find places in our scripts where the other behaviour is required, | |
# and we actually DESIRE to tolerate intermediate failures in a pipeline. | |
# | |
set -o pipefail | |
# THIS COMMENT IS TENTATIVE, AND MAY BE WRONG. | |
# "set -o pipefail" forces an error in any stage of a | |
# pipeline to cause the return of an error. | |
# That means that if we go, say, "stuff="$(route -n | grep xxx | sed .... " | |
# the absence of the text xxx in the output of 'route' will return a non-zero code, | |
# which is interpreted as an error, and in turn causes an error trap. | |
# So if we intend to quietly return an empty string when the xxx is not present we need to append " || true " | |
# BUT - I don't think it is adequate to do | |
# stuff="$(route -n | grep xxx | sed .... || true )" | |
# because the failure of grep will cause the entire pipeline to return an error | |
# and I haven't been able to find out whether the "|| true" is applied to 'sed' or the entire pipeline. | |
# Therefore we should do: | |
# stuff="$(route -n | grep xxx | sed .... )" || true | |
# which seems to do the trick. | |
if ${doTrapErrors} | |
then | |
# There is a facility in bash to make all errors which would be trapped as "ERR" trigger an exit instead. | |
# Now as it happens the "which line number" feature seems to work for errors but not exits. | |
# Plus we can set up a separate trap for errors, and issue a better report than cramming everything into one. | |
# Therefore we don't do "set -e" - in fact we positively turn it off. | |
set +e | |
# Set up features to allow exit to be forced on reading from uninitialised variables. | |
# @FIXME DGC 28-Feb-2017 | |
# It seems obvious to a blind man running that this should turn into an ERR, not an EXIT | |
# but that's not the way it works in bash :-( | |
set_u="-u" | |
restoreExitOnUninitialisedVariableTrapping() | |
{ | |
set ${set_u} | |
} | |
setUpExitOnUninitialisedVariableTrapping() | |
{ | |
restoreExitOnUninitialisedVariableTrapping | |
} | |
# | |
# The semantics of "trap" is that on receiving the signal named as the 2nd parameter | |
# we run the text string given as the first parameter as a script. | |
# Use single quotes so that things like $0 and $LINENO are interpreted as if during the line which failed | |
# not the line which sets up the trap. | |
# ( LINENO seems to be badly defined for exits; it works OK for ERR traps. ) | |
# NOTE - the routine name given is NOT evaluated until used. | |
# so it is perfectly possible to set up a trap for a mis-typed routine name | |
# and never find out until such an error is trapped for the first time years later. | |
# Extra care is needed to ensure the routines we invoke actually exist. | |
# It's possible we could fix that by holding the function names in variables, | |
# and trapping uninitialised variables, but we haven't tried, and it would make for difficult reading. | |
# | |
trap ' unexpectedError__ ${0} ${LINENO} ' ERR # ? | |
# NB when access to an uninitialised variable is faulted, it forces an exit, not an error. | |
trap ' unexpectedExit__ ${0} ${LINENO} ' EXIT # 0 | |
errorTrapDefused__=false | |
# Since all ( ??? ) other signals are asynchronous and external to the actions of the script itself, | |
# there is little point in passing the line number at which the script stopped running, | |
# However if we are relying on an entry written to the event log, we will want to know the name of the script. | |
trap ' receivedSigint__ ${0} ${LINENO} ' SIGINT # 2 | |
trap ' receivedSigterm__ ${0} ${LINENO} ' SIGTERM # 15 | |
trap ' receivedSigpipe__ ${0} ${LINENO} ' SIGPIPE # 13 process we are piping to has stopped. | |
# FIXME DGC 16-3-2012 | |
# | |
# NOTE THAT THE FOLLOWING IS IS LAZY CODING | |
# | |
# The perfect way to do this is to run kill -l which spits out a list like the one below from a DX2, and decode it. | |
# Note that the numbers 1 to 9 are fixed since early Linux (shell ?) versions, | |
# but higher numbers have shifted about a bit across versions encountered by Dexdyne | |
# ( and some signals have come and gone ) | |
# so only the textual labels offer any certainty of selecting the desired signal. | |
# | |
# 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP | |
# 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1 | |
# 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM | |
# 16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP | |
# 21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ | |
# 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR | |
# 31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3 | |
# 38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8 | |
# 43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13 | |
# 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12 | |
# 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 | |
# 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2 | |
# 63) SIGRTMAX-1 64) SIGRTMAX | |
# | |
# then process that so as to trap all signals we don't otherwise deal with | |
# -- leave that as an exercise for another day. | |
# Beware - only the versions without "SIG" on the front | |
# are acceptable to kill commands in shell scripts on the n8000. | |
# A reminder of the parameters to unexpectedTrap__: | |
# | |
# @param Dexdyne-defined text corresponding to the name of the signal. | |
# @param The script in which the issue occurred. | |
# @param The line number at which the issue occurred. | |
# @param true/false to decide if the trap routine should force a script exit. | |
# @param if we do force an exit, this is the script exit code. | |
# numeric | |
# value | |
# on n8000 | |
trap ' unexpectedTrap__ SIGHUP ${0} ${LINENO} true 1 ' SIGHUP # 1 | |
trap ' unexpectedTrap__ SIGQUIT ${0} ${LINENO} true 1 ' SIGQUIT # 3 | |
trap ' unexpectedTrap__ SIGILL ${0} ${LINENO} true 1 ' SIGILL # 4 | |
trap ' unexpectedTrap__ SIGTRAP ${0} ${LINENO} true 1 ' SIGTRAP # 5 | |
trap ' unexpectedTrap__ SIGABRT ${0} ${LINENO} true 1 ' SIGABRT # 6 | |
trap ' unexpectedTrap__ SIGBUS ${0} ${LINENO} true 1 ' SIGBUS # 7 | |
trap ' unexpectedTrap__ SIGFPE ${0} ${LINENO} true 1 ' SIGFPE # 8 | |
# @FIXME DGC 27-Feb-2017 | |
# According to the very helpful "shellcheck", this is a waste of time, | |
# as SIGKILL and SIGSTOP cannot be trapped :-) | |
# shellcheck disable=SC2173 | |
trap ' unexpectedTrap__ SIGKILL ${0} ${LINENO} true 1 ' SIGKILL # 9 | |
trap ' unexpectedTrap__ SIGUSR1 ${0} ${LINENO} true 1 ' SIGUSR1 # 10 | |
trap ' unexpectedTrap__ SIGSEGV ${0} ${LINENO} true 1 ' SIGSEGV # 11 | |
trap ' unexpectedTrap__ SIGUSR2 ${0} ${LINENO} true 1 ' SIGUSR2 # 12 | |
trap ' unexpectedTrap__ SIGALRM ${0} ${LINENO} true 1 ' SIGALRM # 14 | |
trap ' unexpectedTrap__ SIGSTKFLT ${0} ${LINENO} true 1 ' SIGSTKFLT # 16 | |
# It's normal that a script has child processes that terminate - so this should not be trapped. | |
# | |
# we could trap it and take no action, but it would just waste processor power. | |
# | |
#trap ' unexpectedTrap__ SIGCHLD ${0} ${LINENO} false 0 ' SIGCHLD # 17 | |
trap ' unexpectedTrap__ SIGCONT ${0} ${LINENO} true 1 ' SIGCONT # 18 | |
# @FIXME DGC 27-Feb-2017 | |
# According to the very helpful "shellcheck", this is a waste of time, | |
# as SIGKILL and SIGSTOP cannot be trapped :-) | |
# shellcheck disable=SC2173 | |
trap ' unexpectedTrap__ SIGSTOP ${0} ${LINENO} true 1 ' SIGSTOP # 19 | |
trap ' unexpectedTrap__ SIGSTP ${0} ${LINENO} true 1 ' SIGTSTP # 20 | |
trap ' unexpectedTrap__ SIGTTIN ${0} ${LINENO} true 1 ' SIGTTIN # 21 | |
trap ' unexpectedTrap__ SIGTTOU ${0} ${LINENO} true 1 ' SIGTTOU # 22 | |
trap ' unexpectedTrap__ SIGURG ${0} ${LINENO} true 1 ' SIGURG # 23 | |
trap ' unexpectedTrap__ SIGXCPU ${0} ${LINENO} true 1 ' SIGXCPU # 24 | |
trap ' unexpectedTrap__ SIGXFSZ ${0} ${LINENO} true 1 ' SIGXFSZ # 25 | |
trap ' unexpectedTrap__ SIGVTALRM ${0} ${LINENO} true 1 ' SIGVTALRM # 26 | |
trap ' unexpectedTrap__ SIGPROF ${0} ${LINENO} true 1 ' SIGPROF # 27 | |
# We have seen this - it informs a process that "it's window size has changed" | |
# it happens when we resize the terminal window during debugging - leave it trapped for now | |
# so it could be logged ( it shouldn't be happening on an embedded system!!! ) | |
# | |
# I have amended the code so it doesn't write to the last-signal file | |
# it only outputs one line on the script's console and carries on. | |
# | |
# I have added code above so that it cannot cause a script abort | |
trap ' unexpectedTrap__ SIGWINCH ${0} ${LINENO} false 0 ' SIGWINCH # 28 | |
# @FIXME - see above | |
# But be careful how much run-time we add - this is an overhead at the start-up of ALL scripts. | |
# It could be better to do 'kill -l' on the first encounter only | |
# and store the flags in /tmp/kill-supports-these, or a global shell variable | |
# then test it with shell pattern-matching. | |
# | |
if kill -l | grep -q SIGPOLL | |
then | |
# Older script processors have this. | |
trap ' unexpectedTrap__ SIGPOLL ${0} ${LINENO} true 1 ' SIGPOLL # 29 | |
else | |
# The more recent bash script processors accept this. | |
trap ' unexpectedTrap__ SIGIO ${0} ${LINENO} true 1 ' SIGIO # 29 | |
fi | |
trap ' unexpectedTrap__ SIGPWR ${0} ${LINENO} true 1 ' SIGPWR # 30 | |
trap ' unexpectedTrap__ SIGSYS ${0} ${LINENO} true 1 ' SIGSYS # 31 | |
# on the N8000 kill -l lists the following which we do not yet bother with | |
# | |
# SIGRTMIN # 34 | |
# SIGRTMIN+1 # 35 | |
# SIGRTMIN+2 # 36 | |
# SIGRTMIN+3 # 37 | |
# SIGRTMIN+4 # 38 | |
# SIGRTMIN+5 # 39 | |
# SIGRTMIN+6 # 40 | |
# SIGRTMIN+7 # 41 | |
# SIGRTMIN+8 # 42 | |
# SIGRTMIN+9 # 43 | |
# SIGRTMIN+10 # 44 | |
# SIGRTMIN+11 # 45 | |
# SIGRTMIN+12 # 46 | |
# SIGRTMIN+13 # 47 | |
# SIGRTMIN+14 # 48 | |
# SIGRTMIN+15 # 49 | |
# SIGRTMAX-14 # 50 | |
# SIGRTMAX-13 # 51 | |
# SIGRTMAX-12 # 52 | |
# SIGRTMAX-11 # 53 | |
# SIGRTMAX-10 # 54 | |
# SIGRTMAX-9 # 55 | |
# SIGRTMAX-8 # 56 | |
# SIGRTMAX-7 # 57 | |
# SIGRTMAX-6 # 58 | |
# SIGRTMAX-5 # 59 | |
# SIGRTMAX-4 # 60 | |
# SIGRTMAX-3 # 61 | |
# SIGRTMAX-2 # 62 | |
# SIGRTMAX-1 # 63 | |
# SIGRTMAX # 64 | |
# You would think it would be "generally helpful" to force noticing of errors in scripts | |
# ( by invoking -e 'quit on error' ) | |
# but this means that ANYTHING which does exit 1 ( or any other non-zero value ) | |
# will immediately abort the script, | |
# even if we had every intention of testing the return code in the following line | |
# | |
# There's no way to say "only abort if I don't test the return value myself quite soon".... | |
# | |
# BUT - there's a way around this - if we use | |
# | |
# ( command ) ; rc=$? | |
# | |
# Then the exit because of -e is NOT taken, but we can still pick up the return code | |
# | |
# You can also put the command in a "true/false" situation - so kludges like | |
# | |
# command && true | |
# false || command | |
# | |
# will suppress the trap. | |
# | |
# Having experimented a bit, we now prefer: | |
# if command; rc=$? ; then : ; fi | |
# | |
# Exit ( and therefore trap ) on reading uninitialised variables. | |
setUpExitOnUninitialisedVariableTrapping | |
fi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment