My two cents on debugging, if you have ssh access to some of your nodes, especially the one where an xp failed. There is also a possibility to remotely debug the code. This may seem a bit tricky however could help to check the pipeline. I tested it with VSCode.
- ssh to the machine and the environment (docker container etc)
- Install
debugpy
pip install debugpy