The gist here provides a script to automate the process of installing Sun Grid Engine (SGE) on a single EC2 machine.
SGE is a job scheduler for a computing cluster. This usually involves a cluster of multiple machines. However for many applications we don't need a massive computing cluster and a cluster of 8-30 nodes would be sufficient. In this tutorial we set-up SGE on a single amazon EC2 machine. The reasons for doing so are as follows:
- Automation: Setting-up a cluster with SGE is fairly involved as it requires multiple machines communicating with each other and having some shared memory. A single machine with multiple cores is already a simple cluster where the memory is shared across cores.
- Moderate Size: Amazon EC2 instances provide a variety of computing options with the number of cores ranging from 1 to 128.
- Cost: The On-Demand price structure of AWS makes this a relatively cheap option. Further cost reduction can be achieved by using spot instances.
This tutorial assumes the following
- User has an AWS account
- User can start an amazon EC2 instance
- User can SSH into a started EC2 instance
- (Optional) User has downloaded all needed software/packages. For users working with R, there are numerous publicly available script which automate installing R and some required packages. Alternatively, I recommend using an AWS machine image (AMI) which comes with R pre-installed, my personal favorite is one by Louis Aslett.
Once you have SSH'ed into your instance run the following commands:
git clone https://gist.github.com/9d14da97d9ad1f8eccc36dc14390e4e0.git files/
cd files
sudo chmod +x install_sge.sh loop.sh sleep.sh
./install_sge.sh
./loop.sh
- The first command installs all the files we will need into a folder
files
. You can use a different folder if you like. - Changes the directory to where we downloaded the files
- Makes the scripts executable
- Runs the script which will ask permission for installing files, say
Y
to all. - Sends a couple of jobs to the cluster to test it
The last command is optional but is a good way to check if the cluster is working, once the jobs have been submitted we can check the status by running
qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
[email protected] BIP 0/4/4 0.04 lx26-amd64
14 0.50000 sleep.sh ubuntu r 12/02/2016 20:55:09 1
15 0.50000 sleep.sh ubuntu r 12/02/2016 20:55:09 1
16 0.50000 sleep.sh ubuntu r 12/02/2016 20:55:09 1
17 0.50000 sleep.sh ubuntu r 12/02/2016 20:55:09 1
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
18 0.50000 sleep.sh ubuntu qw 12/02/2016 20:54:47 1