Skip to content

Instantly share code, notes, and snippets.

@IdoBar
Last active March 7, 2024 14:32
Show Gist options
  • Save IdoBar/5678faf8bde18a73fd4e7d9fd35db43f to your computer and use it in GitHub Desktop.
Save IdoBar/5678faf8bde18a73fd4e7d9fd35db43f to your computer and use it in GitHub Desktop.
Setting up a NECTAR Cloud multi-user teaching server using conda

Setting up a cloud-based multi-user teaching server using conda

nectarcloud

Setup steps

  1. Apply for an allocation at NECTAR-RDS
  2. Setup 2 key-pairs (one will be for 'root-designated' login and the other for 'general-user' login), download the private keys to a secure location. If you're planning to use KiTTY or PuTTY to login to the server from Windows, you'll need to convert the .pem key file to a .ppk, as explained here. You can (and should) password-protect this key (can be done in PuTTYgen when converting the key from the default .pem format to a PuTTY-supported .ppk).
  3. Setup a customised security group to allow remote access (open port 22 for ssh logins and any additional ones if required)
  4. Start up an instance based on an Ubuntu image (either built-in, see examples below or from a previous snapshot), choose the 'general-user' key pair and the new security groups that you've created in the previous steps when creating the image and choose a flavour that matches your allocation and needs (I choose m1.xlarge). See instructions for steps 2-4 in this NECTAR tutorial.

NECTAR Instances

  1. Login to the server as root (default root username and password is ubuntu for ubuntu images) using PuTTY (Windows) or directly from the command line using ssh (MacOS, Linux), see connection guide for more details.
  2. Install essential packages, update the default profile files that will be applied to all new users and update Conda to work for multiple users (make the executables available for all users by default, see notes for Conda 4.4 and Administering a multi-user conda installation):
sudo apt-get update && sudo apt-get install tmux build-essential xvfb acl
sudo sed -i 's/ls -alF/ls -alFh/' /etc/skel/.bashrc
sudo mkdir /etc/skel/bin
sudo mkdir /etc/skel/sandbox
  1. Download and use the setup_new_user.bash script to add new users and setup their ssh folders and keys (assuming a single key is shared):
    --Start by creating a sudo account to be used instead of root--
mkdir ~/bin
curl -L -o ~/bin/setup_new_user \
https://gist.githubusercontent.com/IdoBar/5678faf8bde18a73fd4e7d9fd35db43f/raw/72cca55c57625a7fbddba901399546d7dc40bd33/setup_new_user.bash
chmod 754 ~/bin/setup_new_user
source ~/.profile
# First create an 'admin' user that will be used instead of the root, assign him to the same group as the future users/students ('students' in this case) and the staff ('admins' group)
setup_new_user <newsudouser> <students_group> <admins_group> sudo
# The new admin user will be added to the sudo group and will require a password
sudo passwd <newsudouser>
# Now add another 'anaconda' user that will be used to install conda
setup_new_user anaconda # <students_group> <admins_group> (if need to add the user to both the teaching and students groups) 
# to make sure files and folders created in the miniconda environment will have admin group writing permissions
sudo setfacl -Rm d:g:admins:rwX,g:admins:rwX,o::rX /home/anaconda 
# add any additional admin users
setup_new_user <newadminuser> <students_group> <admins_group>
  1. Login as the anaconda user to setup a new conda environment with all the needed packages (if not loaded from existing snapshot/application). Follow the prompts and accept the defaults to install conda under the 'anaconda' user folder (/home/anaconda/miniconda2). Using the current setup, all members of admins group can also set new environments in the same installation (students can still setup their own environment, which will get installed under their ~/.conda/envs folder)
# Start a new shell or tmux tab, then:
CONDA_ENV=nsc3030 # choose any appropriate name
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
# conda init
# activate conda
source ~/.bashrc
conda config --add channels conda-forge
conda config --append channels bioconda 
conda config --set auto_stack 1
# conda config --append channels anaconda # not neccesary
# install basic packages into the base environment
mamba install parallel git pigz libgcc gnutls libuuid rclone gawk
# to enable jupyter kernels
# mamba install irkernel calysto_bash
# create the environment with the requested packages
wget https://raw.githubusercontent.com/IdoBar/3030NSC-workshops-binder/main/.binder/environment.yml
mamba env create -n $CONDA_ENV -f environment.yml
mamba env create -n $CONDA_ENV libgcc gnutls libuuid pandoc qt scipy rclone genozip\
   ncurses readline git libgfortran pigz biopython etetoolkit::ete3 etetoolkit::ete_toolchain \
    etetoolkit::ete3_external_apps blast entrez-direct emboss fastqc multiqc parallel gawk \
   bioawk hmmer pfam_scan clustalw mustang fastqe ncbi-datasets-cli jq unzip # fasttree 
conda activate $CONDA_ENV
# make all the ete3-apps available (except fasttree)
printf '#!/bin/sh\n export PATH=$PATH:$CONDA_PREFIX/bin/ete3_apps/bin\nexport \
MAFFT_BINARIES=$CONDA_PREFIX/bin/ete3_apps/bin\n' > $CONDA_PREFIX/etc/conda/activate.d/ete3_bin.sh
ete3 build check 2>&1 | grep MISSING | sed -r 's/ //g; s/:.+//' | \
   parallel "ln -s $CONDA_PREFIX/bin/{} $CONDA_PREFIX/bin/ete3_apps/bin/"
ln -sf $CONDA_PREFIX/bin/ete3_apps/bin/Slr $CONDA_PREFIX/bin/ete3_apps/bin/slr
ln -sf $CONDA_PREFIX/bin/t_coffee $CONDA_PREFIX/bin/ete3_apps/bin/
ln -sf $CONDA_PREFIX/bin/ete3_apps/bin/t_coffee $CONDA_PREFIX/bin/ete3_apps/bin/tcoffee
# fix all perl executables to use the correct perl version
find $CONDA_PREFIX -name "*.pl" | parallel -q sed -i.bak 's|!/usr/bin/perl|!/usr/bin/env perl|' {}
# pin qt and pyqt versions to avoid downgrading them
printf "pyqt ==5.9.2\nqt ==5.9.6" > $CONDA_PREFIX/conda-meta/pinned
# remove archives and chache of downloaded packages 
conda clean -y --all
# setup the environment location as default for future users
printf "envs_dirs:\n  - %s\n" $(dirname $CONDA_PREFIX) >> $HOME/.condarc
# install additional tools (probcons, bioinformatics-hacks)
mkdir -p ~/etc/tools && cd ~/etc/tools
wget http://probcons.stanford.edu/probcons_v1_12.tar.gz && tar xzf probcons_v1_12.tar.gz && cd probcons
# fix compiler headers, see https://stackoverflow.com/questions/9403975/strcmp-was-not-declared-in-this-scope
parallel -q sed -i.bak 's/#include <string>/#include <string.h>/' ::: *.cc  
make
find `pwd` -type f -executable | parallel  ln -s {} $CONDA_PREFIX/bin/ # link to conda binaries folder
git clone https://github.com/audy/bioinformatics-hacks.git && cd bioinformatics-hacks/bin
find `pwd` -type f -executable | parallel  ln -s {} $CONDA_PREFIX/bin/ # link to conda binaries folder

If there are issues with conda (such as the reported issues with v4.7), run the following and then continue to install packages (with conda install above)

conda config --set allow_conda_downgrades true # only if issues 
conda install conda=4.6.11
  1. Login as the admin user to setup conda for all future users:
cp /home/anaconda/.condarc $HOME/
sudo cp /home/anaconda/.condarc /etc/skel/.condarc
printf '; Start the section for BLAST configuration\n[BLAST]\n; Specifies the path where BLAST databases are installed\nBLASTDB=/mnt/shared/databases\n' > $HOME/.ncbirc
sudo cp $HOME/.ncbirc /etc/skel/.ncbirc
printf "cacert=/etc/ssl/certs/ca-certificates.crt\n" > $HOME/.curlrc
sudo cp $HOME/.curlrc /etc/skel/.curlrc
CONDA_HOME=$(grep "conda" /home/anaconda/.conda/environments.txt | head -n1)
sudo ln -s $CONDA_HOME/etc/profile.d/conda.sh /etc/profile.d/conda.sh
# if you want the conda env to be activated by default, add the following
# sudo echo "conda activate $CONDA_ENV" >> /etc/skel/.bashrc

Check this SO thread to see if the shared space can be managed better to automatically inherit the group permission for new files and directories

  1. Login as the admin user to setup a shared folder for databases and non-conda tools for all future users:
CONDA_ENV=nsc3030
sudo sh -c "mkdir -p /mnt/shared/databases && mkdir -p /mnt/shared/tools"
cd /mnt/shared/databases
source ~/.bash
conda activate $CONDA_ENV
sudo $(which update_blastdb.pl) --source GCS taxdb swissprot 16S_ribosomal_RNA
  1. Install additional sequence alignment databases, such as bench and qscore from https://www.drive5.com/bench/ or interproscan:
cd /mnt/shared/tools
# Download bench.tar.gz to this folder
wget https://www.drive5.com/bench/bench.tar.gz
tar xzvf bench.tar.gz && cd bench1.0
mkdir qscore_src && cd qscore_src
# Download qscore_src.tar.gz to this folder
wget https://www.drive5.com/qscore/qscore_src.tar.gz && tar xzvf qscore_src.tar.gz
# Fix qscore.h file, see https://www.biostars.org/p/288471/
sed -i.bak 's/#include <errno.h>/#include <errno.h>\n#include <climits>/' qscore.h
make && cp qscore ../
ln -s /mnt/shared/tools/bench1.0/qscore $CONDA_PREFIX/bin/ # link to conda binaries folder
# first install interproscan following the instructions at https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html
ln -s /mnt/shared/tools/interproscan-5.36-75.0/interproscan.sh $CONDA_PREFIX/bin/ # link to conda binaries folder
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib # needed for prosite to work in interproscan (before running)
# enable write access for admins and read and execute access for users/students group
sudo chgrp -R admins /mnt/shared/ 
# sudo chown -R :admins /mnt/shared  
sudo setfacl -Rm d:g:admins:rwX,g:admins:rw,o::r /mnt/shared 
sudo find /mnt/shared -type f -executable -exec chmod 664 {} + -o -type d -exec chmod 775 {} +
# setfacl -m g:students:rX -R /mnt/shared
  1. Use the setup_new_user_par.bash script to add new users and setup their ssh folders and keys (assuming a single key is shared and requiring GNU parallel):
# download the script again to the new user bin folder
curl -L -o ~/bin/setup_new_user_par \
   https://gist.githubusercontent.com/IdoBar/5678faf8bde18a73fd4e7d9fd35db43f/raw/efc24bd2ef1c4d74cdd789e2a8f0b8acf80d0374/setup_new_user_par.bash && chmod 754 ~/bin/setup_new_user_par
# create a new env_parallel session, see [env_parallel](https://www.gnu.org/software/parallel/env_parallel.html)
. `which env_parallel.bash`
env_parallel --session
# store sudo password as an environment variable
read -s -p "Enter Password for sudo: " sudoPW
# read the list of users from a file and assign them to a group ('students' in this case)
cat users_list.txt | env_parallel --env sudoPW -j1 setup_new_user_par {} students
  1. Login as root (ubuntu for ubuntu images) and edit the .ssh/authorized_keys file to replace the 'general-user' public key with the 'root-designated' public one. You can (and should) password-protect this key (can be done in PuTTYgen along with converting the key from the default .pem format to a PuTTY-supported .ppk).

Additional conda packages and other tools can be installed by the 'admin' user if needed.

Server Usage

After this initial setup the users will need a copy of the 'general-user' private key (preferrably deliver on a USB stick, do not email it!!) and its associated passowrd (if set). Then each user can use PuTTY or ssh to login, check the shared conda environments with conda env list, then choose the one to use with conda activate <env-name>.
KiTTY or Portable PuTTY saves its session information to the disk rather than the Windows registry, so the admin can setup the connection details (IP address, ssh key location, etc.), save the session information and distribute it as a portable version for the students to use on shared computers.

#!/bin/bash
# Usage: setup_new_user.bash 'newuser' 'group-to-add-to' 'another-group' 'yet-another-group' ...
# Ask user for sudo password (to be used when needed)
read -s -p "Enter Password for sudo: " sudoPW
USER="$1"; shift
printf "\n"
if [[ $(getent group $USERNAME) ]]; then
echo $sudoPW | sudo -S sh -c "adduser --ingroup $USERNAME --disabled-password --gecos '' $USERNAME"
else
echo $sudoPW | sudo -S sh -c "adduser --disabled-password --gecos '' $USERNAME"
fi
echo $sudoPW | sudo -S sh -c "cp -r ~/.ssh /home/$USERNAME/ && \
chown -R $USERNAME /home/$USERNAME/.ssh && \
chmod 700 /home/$USERNAME/.ssh"
# process groups
for GROUP in "$@"; do
# Check if group exists
#if [[ $# -eq 2 ]]; then
# GROUP=$2
if [[ ! $(getent group $GROUP) ]]; then
read -e -p "Group $GROUP does not exist, do you want to create it? (press enter or Yes, or type [N/n] to exit): " -i "Yes" CREATE
printf "\n"
if [[ "$CREATE" != "Yes" ]]; then
printf "Exiting without creating group and adding user\n"
exit 1
fi
echo $sudoPW | sudo -S sh -c "groupadd $GROUP"
fi
echo $sudoPW | sudo -S sh -c "adduser $USER $GROUP"
# echo "User $USER added to group $GROUP."
done
# fi
exit 0
#!/bin/bash
# Ask user for sudo password (run the line below outside of the script for processing multiple inputs using env_parallel)
# read -s -p "Enter Password for sudo: " sudoPW
# Usage:
# cat user_list.txt | env_parallel -j1 --env sudoPW setup_new_user_par {} 'group-to-add-to' 'another-group' 'yet-another-group'
USERNAME="$1"; shift
printf "\n"
if [[ $(getent group $USERNAME) ]]; then
echo $sudoPW | sudo -S sh -c "adduser --ingroup $USERNAME --disabled-password --gecos '' $USERNAME"
else
echo $sudoPW | sudo -S sh -c "adduser --disabled-password --gecos '' $USERNAME"
fi
echo $sudoPW | sudo -S sh -c "cp -r ~/.ssh /home/$USERNAME/ && \
chown -R $USERNAME /home/$USERNAME/.ssh && \
chmod 700 /home/$USERNAME/.ssh"
# process groups
for GROUP in "$@"; do
# Check if group exists
#if [[ $# -eq 2 ]]; then
# GROUP=$2
if [[ ! $(getent group $GROUP) ]]; then
read -e -p "Group $GROUP does not exist, do you want to create it? (press enter or Yes, or type [N/n] to exit): " -i "Yes"$
printf "\n"
if [[ "$CREATE" != "Yes" ]]; then
printf "Exiting without creating group and adding user\n"
exit 1
fi
groupadd $GROUP
fi
echo $sudoPW | sudo -S sh -c "adduser $USERNAME $GROUP"
# echo "User $USERNAME added to group $GROUP."
done
# fi
exit 0
@IdoBar
Copy link
Author

IdoBar commented Mar 8, 2022

Added setup_new_user_par.bash, which allows to setup multiple users (from a file containing the list of them).
I also highly recommend using mamba instead of conda to install packages and solve dependencies much quicker (see documentation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment