Skip to content

Instantly share code, notes, and snippets.

@willwhitney
Created August 24, 2018 19:56
Show Gist options
  • Save willwhitney/5e3a07c13f3a25c95ec5cfab85d3bfc6 to your computer and use it in GitHub Desktop.
Save willwhitney/5e3a07c13f3a25c95ec5cfab85d3bfc6 to your computer and use it in GitHub Desktop.
Fragment of python code for catching signals from Slurm and restarting the job
import signal
# depends on requesting SIGUSR1 in runner file: https://gist.github.com/willwhitney/e1509c86522896c6930d2fe9ea49a522
def handle_signal(signal_value, _):
signame = signal.Signals(signal_value).name
if signal_value == signal.SIGUSR1:
print('Process {} got signal {}. Saving and restarting.'.format(
os.getpid(), signame), flush=True)
save_dynamics(epoch)
if opt.restart_command is not None:
os.system(opt.restart_command)
env.close()
if mjdynamics is not None:
mjdynamics.close()
sys.exit(0)
else:
print('Process {} got signal {}. Doing nothing.'.format(
os.getpid(), signame), flush=True)
signal.signal(signal.SIGUSR1, handle_signal)
signal.signal(signal.SIGCONT, handle_signal)
signal.signal(signal.SIGTERM, handle_signal)
@jtourille
Copy link

Interesting piece of code. Thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment