GNU Parallel is a multipurpose program for running shell commands in parallel, which can often be used to replace shell script loops,find -exec
, and find | xargs
. It provides the --sshlogin
and --sshloginfile
options to farm out jobs to multiple hosts, as well as options for sending and retrieving static resources and and per-job input and output files.
For any particular task, however, keeping track of which files need to pushed to and retrieved from the remote hosts is somewhat of a hassle. Furthermore, cancelled or failed runs can leave garbage on the remote hosts, and if input and output files are large, sending them to local disk on the remote hosts is somewhat inefficient.
In a traditional cluster, this problem would be solved by giving all nodes access to a shared filesystem, usually with NFS or something more exotic. However, NFS doesn't wo