Skip to content

Instantly share code, notes, and snippets.

@barryrowlingson
Last active April 8, 2022 16:21
Show Gist options
  • Save barryrowlingson/72c6771da89d9c9bf588 to your computer and use it in GitHub Desktop.
Save barryrowlingson/72c6771da89d9c9bf588 to your computer and use it in GitHub Desktop.
Run R scripts in SGE
#!/bin/bash
. /etc/profile
usage () {
printf "Usage: %s: [-h] -m -d -c -n name -j jobname -b Bfile -e Efile <Rfile> [Rargs]\n" $(basename $0) >&2
}
mflag=
nflag=
bflag=
cflag=
dflag=
eflag=
jflag=
while getopts 'hmcdn:b:e:j:' OPTION
do
case $OPTION in
h) usage
cat <<EOF
Submits R jobs to the HEC
=========================
-h : show this help
-m : load current modules in script
-d : debug - just write out the batch job file, don't submit it
-b Bfile : file of extra shell directives/shell code to insert at beginning (may appear more than once)
-e Efile : file of extra shell script to insert at end (may appear more than once)
-j name : job name
-n name : creates a subdir called 'name' from the current directory for
output and error. The R process will start with this as its working
directory
-c : copy R file to the 'name' directory and run that.
<Rfile> : path to the R file to run.
[Rargs] : further arguments that can be picked up using commandArgs() in R.
EOF
exit 1
;;
j) jflag=1
jobname="$OPTARG"
;;
n) nflag=1
name="$OPTARG"
;;
b) bflag=1
bjobx="$OPTARG"
if [ ! -f $bjobx ] ; then
echo job begin file \"$bjobx\" not found >&2
exit 2
fi
bjobextras+=("$bjobx")
;;
e) eflag=1
ejobx="$OPTARG"
if [ ! -f $ejobs ] ; then
echo job end file \"$endextras\" not found >&2
exit 2
fi
endextras+=("$ejobx")
;;
m) mflag=1
;;
c) cflag=1
;;
d) dflag=1
;;
?) usage
exit 2
;;
esac
done
shift $(($OPTIND - 1))
if [ ! "$jflag" ]
then
printf "No job name given\n" >&2
usage
exit 2
fi
if [ ! "$nflag" ]
then
printf "No output folder name argument given\n" >&2
usage
exit 2
fi
if test ${#} -lt 1 ; then
echo No Rfile arg given >&2
echo Usage: $0 Rfile [args] >&2
exit -1
fi
Rfile=$1 ; shift
if [ -f $Rfile ] ; then
Rfile=$( readlink -f "$( dirname "$Rfile" )" )/$( basename "$Rfile" )
# echo R code is in $Rfile >&2
else
echo $Rfile not found >&2
usage
exit 2
fi
mkdir -p $name
if [ "$cflag" ] ; then
mkdir -p $name
cp $Rfile $name
Rfile=$name/`basename $Rfile`
Rfile=$( readlink -f "$( dirname "$Rfile" )" )/$( basename "$Rfile" )
fi
# default R if nothing else
module_loader="module add R"
if [ "$mflag" ] ; then
mods=`( module -t list 3>&1 1>&2- 2>&3- | tail -n +2 )`
module_loader="module load "$mods
fi
#echo commandArgs for R: "$*" >&2
(
cat <<EOF
#$ -S /bin/bash
#$ -N $jobname
#$ -o $name/output-\$JOB_NAME
#$ -e $name/errors-\$JOB_NAME
EOF
if [ "$bflag" ] ; then
for bj in ${bjobextras[@]}; do
cat $bj
done
fi
cat <<EOF
. /etc/profile
EOF
echo $module_loader
cat <<EOF
echo changing to $name
cd $name
R --vanilla --args $* < $Rfile
EOF
if [ "$eflag" ] ; then
for ej in ${endextras[@]}; do
cat $ej
done
fi
) | (
if [ "$dflag" ] ; then
cat
else
qsub -N $jobname
fi
)
n = as.numeric(commandArgs(trailingOnly=TRUE)[1])
cat("sample size is ",n,"\n")
x1=rnorm(n)
x2=rnorm(n)
t = t.test(x1,x2)
print(t)
@barryrowlingson
Copy link
Author

This is a shell script to help submitting R jobs to a Son of Grid Engine (SGE) batch system. To install copy the script to a directory on your path and maybe rename it without the .sh.

Simple usage, something like:

 submitR -n test1 -j job100  test1.R 100

that makes a directory called test1 below the current directory and runs a job called job999 from test1.R in the current directory, passing 100 to R's command args. See the test1.R file for arg handling. You might then do:

 submitR -n test1 -j job1k test1.R 1000

to run with 1000 random numbers, storing the output in the same folder (but with different names).

Other goodies: the -b and -e options let you include snippets at the start and end of the batch control job. I use -b to modify the resource requirement flags for big memory jobs, and the -e flag to send notifications of job completion. So for example:

 submitR -n this -j job -b zmail.qsub foo.R

will add those two lines at the start of the job script and that will trigger an email notification on end of job. Put your email address in the file, of course.

The -c option copies the R script, because otherwise if you edit the R script after submission and before execution you'll get the modified script.

Our cluster uses modules to activate R and compilers and other tools, so here by default the script gets a module add R line. However if you use the -m option it will get your current loaded module list and load those on the compute nodes. This is handy if you want to run a non-default R, for example.

The -d option just prints the batch job file without running it for your inspection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment