- Apply for an allocation at NECTAR-RDS
- Setup 2 key-pairs (one will be for 'root-designated' login and the other for 'general-user' login), download the private keys to a secure location. If you're planning to use KiTTY or PuTTY to login to the server from Windows, you'll need to convert the
.pem
key file to a.ppk
, as explained here. You can (and should) password-protect this key (can be done in PuTTYgen when converting the key from the default.pem
format to a PuTTY-supported.ppk
). - Setup a customised security group to allow remote access (open port 22 for
ssh
logins and any additional ones if required) - Start up an instance based on an Ubuntu image (either built-in, see examples below or from a previous snapshot), choose the 'general-user' key pair and the new security groups that you've created in the previous steps when creating the image and choose a flavour that matches your allocation and needs (I choose
m1.xlarge
). See instructions for steps 2-4 in this NECTAR tutorial.
- Login to the server as root (default root username and password is ubuntu for ubuntu images) using PuTTY (Windows) or directly from the command line using
ssh
(MacOS, Linux), see connection guide for more details. - Install essential packages, update the default profile files that will be applied to all new users and update Conda to work for multiple users (make the executables available for all users by default, see notes for Conda 4.4 and Administering a multi-user conda installation):
sudo apt-get update && sudo apt-get install tmux build-essential xvfb acl
sudo sed -i 's/ls -alF/ls -alFh/' /etc/skel/.bashrc
sudo mkdir /etc/skel/bin
sudo mkdir /etc/skel/sandbox
- Download and use the
setup_new_user.bash
script to add new users and setup their ssh folders and keys (assuming a single key is shared):
--Start by creating a sudo account to be used instead of root--
mkdir ~/bin
curl -L -o ~/bin/setup_new_user \
https://gist.githubusercontent.com/IdoBar/5678faf8bde18a73fd4e7d9fd35db43f/raw/72cca55c57625a7fbddba901399546d7dc40bd33/setup_new_user.bash
chmod 754 ~/bin/setup_new_user
source ~/.profile
# First create an 'admin' user that will be used instead of the root, assign him to the same group as the future users/students ('students' in this case) and the staff ('admins' group)
setup_new_user <newsudouser> <students_group> <admins_group> sudo
# The new admin user will be added to the sudo group and will require a password
sudo passwd <newsudouser>
# Now add another 'anaconda' user that will be used to install conda
setup_new_user anaconda # <students_group> <admins_group> (if need to add the user to both the teaching and students groups)
# to make sure files and folders created in the miniconda environment will have admin group writing permissions
sudo setfacl -Rm d:g:admins:rwX,g:admins:rwX,o::rX /home/anaconda
# add any additional admin users
setup_new_user <newadminuser> <students_group> <admins_group>
- Login as the anaconda user to setup a new conda environment with all the needed packages (if not loaded from existing snapshot/application). Follow the prompts and accept the defaults to install conda under the 'anaconda' user folder (
/home/anaconda/miniconda2
). Using the current setup, all members ofadmins
group can also set new environments in the same installation (students can still setup their own environment, which will get installed under their~/.conda/envs
folder)
# Start a new shell or tmux tab, then:
CONDA_ENV=nsc3030 # choose any appropriate name
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
# conda init
# activate conda
source ~/.bashrc
conda config --add channels conda-forge
conda config --append channels bioconda
conda config --set auto_stack 1
# conda config --append channels anaconda # not neccesary
# install basic packages into the base environment
mamba install parallel git pigz libgcc gnutls libuuid rclone gawk
# to enable jupyter kernels
# mamba install irkernel calysto_bash
# create the environment with the requested packages
wget https://raw.githubusercontent.com/IdoBar/3030NSC-workshops-binder/main/.binder/environment.yml
mamba env create -n $CONDA_ENV -f environment.yml
mamba env create -n $CONDA_ENV libgcc gnutls libuuid pandoc qt scipy rclone genozip\
ncurses readline git libgfortran pigz biopython etetoolkit::ete3 etetoolkit::ete_toolchain \
etetoolkit::ete3_external_apps blast entrez-direct emboss fastqc multiqc parallel gawk \
bioawk hmmer pfam_scan clustalw mustang fastqe ncbi-datasets-cli jq unzip # fasttree
conda activate $CONDA_ENV
# make all the ete3-apps available (except fasttree)
printf '#!/bin/sh\n export PATH=$PATH:$CONDA_PREFIX/bin/ete3_apps/bin\nexport \
MAFFT_BINARIES=$CONDA_PREFIX/bin/ete3_apps/bin\n' > $CONDA_PREFIX/etc/conda/activate.d/ete3_bin.sh
ete3 build check 2>&1 | grep MISSING | sed -r 's/ //g; s/:.+//' | \
parallel "ln -s $CONDA_PREFIX/bin/{} $CONDA_PREFIX/bin/ete3_apps/bin/"
ln -sf $CONDA_PREFIX/bin/ete3_apps/bin/Slr $CONDA_PREFIX/bin/ete3_apps/bin/slr
ln -sf $CONDA_PREFIX/bin/t_coffee $CONDA_PREFIX/bin/ete3_apps/bin/
ln -sf $CONDA_PREFIX/bin/ete3_apps/bin/t_coffee $CONDA_PREFIX/bin/ete3_apps/bin/tcoffee
# fix all perl executables to use the correct perl version
find $CONDA_PREFIX -name "*.pl" | parallel -q sed -i.bak 's|!/usr/bin/perl|!/usr/bin/env perl|' {}
# pin qt and pyqt versions to avoid downgrading them
printf "pyqt ==5.9.2\nqt ==5.9.6" > $CONDA_PREFIX/conda-meta/pinned
# remove archives and chache of downloaded packages
conda clean -y --all
# setup the environment location as default for future users
printf "envs_dirs:\n - %s\n" $(dirname $CONDA_PREFIX) >> $HOME/.condarc
# install additional tools (probcons, bioinformatics-hacks)
mkdir -p ~/etc/tools && cd ~/etc/tools
wget http://probcons.stanford.edu/probcons_v1_12.tar.gz && tar xzf probcons_v1_12.tar.gz && cd probcons
# fix compiler headers, see https://stackoverflow.com/questions/9403975/strcmp-was-not-declared-in-this-scope
parallel -q sed -i.bak 's/#include <string>/#include <string.h>/' ::: *.cc
make
find `pwd` -type f -executable | parallel ln -s {} $CONDA_PREFIX/bin/ # link to conda binaries folder
git clone https://github.com/audy/bioinformatics-hacks.git && cd bioinformatics-hacks/bin
find `pwd` -type f -executable | parallel ln -s {} $CONDA_PREFIX/bin/ # link to conda binaries folder
If there are issues with conda
(such as the reported issues with v4.7), run the following and then continue to install packages (with conda install
above)
conda config --set allow_conda_downgrades true # only if issues
conda install conda=4.6.11
- Login as the admin user to setup conda for all future users:
cp /home/anaconda/.condarc $HOME/
sudo cp /home/anaconda/.condarc /etc/skel/.condarc
printf '; Start the section for BLAST configuration\n[BLAST]\n; Specifies the path where BLAST databases are installed\nBLASTDB=/mnt/shared/databases\n' > $HOME/.ncbirc
sudo cp $HOME/.ncbirc /etc/skel/.ncbirc
printf "cacert=/etc/ssl/certs/ca-certificates.crt\n" > $HOME/.curlrc
sudo cp $HOME/.curlrc /etc/skel/.curlrc
CONDA_HOME=$(grep "conda" /home/anaconda/.conda/environments.txt | head -n1)
sudo ln -s $CONDA_HOME/etc/profile.d/conda.sh /etc/profile.d/conda.sh
# if you want the conda env to be activated by default, add the following
# sudo echo "conda activate $CONDA_ENV" >> /etc/skel/.bashrc
Check this SO thread to see if the shared space can be managed better to automatically inherit the group permission for new files and directories
- Login as the admin user to setup a shared folder for databases and non-conda tools for all future users:
CONDA_ENV=nsc3030
sudo sh -c "mkdir -p /mnt/shared/databases && mkdir -p /mnt/shared/tools"
cd /mnt/shared/databases
source ~/.bash
conda activate $CONDA_ENV
sudo $(which update_blastdb.pl) --source GCS taxdb swissprot 16S_ribosomal_RNA
- Install additional sequence alignment databases, such as
bench
andqscore
from https://www.drive5.com/bench/ or interproscan:
cd /mnt/shared/tools
# Download bench.tar.gz to this folder
wget https://www.drive5.com/bench/bench.tar.gz
tar xzvf bench.tar.gz && cd bench1.0
mkdir qscore_src && cd qscore_src
# Download qscore_src.tar.gz to this folder
wget https://www.drive5.com/qscore/qscore_src.tar.gz && tar xzvf qscore_src.tar.gz
# Fix qscore.h file, see https://www.biostars.org/p/288471/
sed -i.bak 's/#include <errno.h>/#include <errno.h>\n#include <climits>/' qscore.h
make && cp qscore ../
ln -s /mnt/shared/tools/bench1.0/qscore $CONDA_PREFIX/bin/ # link to conda binaries folder
# first install interproscan following the instructions at https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html
ln -s /mnt/shared/tools/interproscan-5.36-75.0/interproscan.sh $CONDA_PREFIX/bin/ # link to conda binaries folder
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib # needed for prosite to work in interproscan (before running)
# enable write access for admins and read and execute access for users/students group
sudo chgrp -R admins /mnt/shared/
# sudo chown -R :admins /mnt/shared
sudo setfacl -Rm d:g:admins:rwX,g:admins:rw,o::r /mnt/shared
sudo find /mnt/shared -type f -executable -exec chmod 664 {} + -o -type d -exec chmod 775 {} +
# setfacl -m g:students:rX -R /mnt/shared
- Use the
setup_new_user_par.bash
script to add new users and setup their ssh folders and keys (assuming a single key is shared and requiring GNU parallel):
# download the script again to the new user bin folder
curl -L -o ~/bin/setup_new_user_par \
https://gist.githubusercontent.com/IdoBar/5678faf8bde18a73fd4e7d9fd35db43f/raw/efc24bd2ef1c4d74cdd789e2a8f0b8acf80d0374/setup_new_user_par.bash && chmod 754 ~/bin/setup_new_user_par
# create a new env_parallel session, see [env_parallel](https://www.gnu.org/software/parallel/env_parallel.html)
. `which env_parallel.bash`
env_parallel --session
# store sudo password as an environment variable
read -s -p "Enter Password for sudo: " sudoPW
# read the list of users from a file and assign them to a group ('students' in this case)
cat users_list.txt | env_parallel --env sudoPW -j1 setup_new_user_par {} students
- Login as root (ubuntu for ubuntu images) and edit the
.ssh/authorized_keys
file to replace the 'general-user' public key with the 'root-designated' public one. You can (and should) password-protect this key (can be done in PuTTYgen along with converting the key from the default.pem
format to a PuTTY-supported.ppk
).
Additional conda packages and other tools can be installed by the 'admin' user if needed.
After this initial setup the users will need a copy of the 'general-user' private key (preferrably deliver on a USB stick, do not email it!!) and its associated passowrd (if set). Then each user can use PuTTY or ssh to login, check the shared conda environments with conda env list
, then choose the one to use with conda activate <env-name>
.
KiTTY or Portable PuTTY saves its session information to the disk rather than the Windows registry, so the admin can setup the connection details (IP address, ssh key location, etc.), save the session information and distribute it as a portable version for the students to use on shared computers.
Updated to start from a clean Ubuntu image, rather than using a pre-built Bioconda application, for easier permission handling (otherwise conda comes installed under the
ubuntu
user home folder)