A quick overview of how to run longjob on DICE - here, focusing on opening a jupyter notebook
for a long period of time (28days)
- ssh through the network gateway:
ssh [email protected]
, and into any compute server you wish to use. - create a screen so that it doesnt kill the process after you log out:
$ screen -S <session-name> # name the screen session
$ screen -S mlp (e.g)
This opens a new screen terminal (more on screen here)
- activate your virtual environment:
source activate mlp
- Start the longjob:
$ longjob -28day -c <jobname/command> # start a long job executing the jobname/command
$ longjob -28day -c "(nohup nice -n 19 jupyter notebook --no-browser --port=<remoteport>)"
<remoteport>
refers to the port that you wish your notebook to run on the remote server
- If you run into problem generating a kerebos key, is it possible that there is already a key that was cached. This might happen:
Waiting for job to start...
krenew: unable to run command (nohup: No such file or directory
krenew: error reading ticket cache: No credentials cache found (filename: /tmp/krb5cc_14asdas42427_GBltpqweqweqeF)
krenew: cannot destroy ticket cache: No credentials cache found (filename: /tmp/krb5cc_14asdas42427_GBltasdasdqwepF)
Due to the mechanism of the command,longjob
(see [1]), we can destroy the current key so that a new one will the kinit
-ed:
(mlp) $ kdestroy # destroy all the keys generated!
Try running the longjob
command again.
Here is a good tutorial: https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/