Skip to content

Instantly share code, notes, and snippets.

View philippbayer's full-sized avatar

Philipp Bayer philippbayer

View GitHub Profile
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8
@philippbayer
philippbayer / torch.md
Last active October 16, 2023 08:01
installing torch/transformers under ROCm on Pawsey

Here's my alias in .bashrc for getting a gpu-dev instance based on https://support.pawsey.org.au/documentation/display/US/Setonix+GPU+Partition+Quick+Start

alias getgpunode='salloc -p gpu-dev --nodes=1 --gpus-per-node=1 --account=${PAWSEY_PROJECT}-gpu'

First, to make a fresh environment:

mamba create -p `pwd`/transformers transformers python=3.10

Install Torch with the closest ROCm version (nothing for 5.4.3, the current 'new' version on Pawsey, and nothing for 5.2.3, the default version). Also setting the pip-cache-dir to somewhere on /scratch.

import os
import sys
import argparse
from statistics import mean
'''
INPUT: tab-delimited blastn output. Assuming that taxonomy ID is in this format:
-outfmt "6 qseqid sseqid staxids sscinames scomnames sskingdoms pident length qlen slen mismatch gapopen gaps qstart qend sstart send stitle evalue bitscore qcovs qcovhsp"
This script also assumes that input has been filtered by 90% identity.
$ awk '{if ($7 > 90) print}' all_results.tsv > all_results.90perc.tsv
#!/bin/bash -l
# SLURM directives
#
# This is an array job with four subtasks
#SBATCH --job-name=align
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=1
#SBATCH --partition=work
@philippbayer
philippbayer / nextflow.config
Last active June 5, 2024 02:00
My current Pawsey nextflow.config
// have this as nextflow.config in the folder of your run for Pawseys Setonix
// i settled on this command for nf-core/mag:
// nextflow run nf-core/mag --input '*{R{1,2}.fastq.gz' --outdir results
// --skip_spades --cat_db https://tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20210107.tar.gz
// --gtdb 'https://data.gtdb.ecogenomic.org/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz'
// -resume -profile singularity
// --refine_bins_dastool --postbinning_input both
// --busco_download_path /SOMEWHERE/busco-data.ezlab.org/v5/data
// --disable-jobs-cancellation

First, a tab-delimited file with genome sizes

Gm01	58711475	26	60	61
Gm03	52519505	59690052	60	61

etc.

Then, to plot the thing:

@philippbayer
philippbayer / changes.md
Last active November 9, 2023 15:20
createRepeatLandscape.pl changes for EDTA/TESorter classes

I changed the following in RepeatMasker/util/createRepeatLandscape.pl to make classes reported by EDTA and TESorter appear in the plot.

Around line 220 I added CACTA repeats as their own class:

              [ 'DNA/Transib',    '#FF9972' ],
              [ 'DNA/CACTA',      '#D45B2C' ],

I got the color by googling #FF9972 and then clicking around in that feature to get a similar looking color.

Then, around line 700, I added all these translations:

@philippbayer
philippbayer / EDTA.md
Last active March 3, 2021 02:34
Running EDTA on Pawsey with Singularity

First, to download EDTA:

module load singularity
singularity pull EDTA.sif docker://quay.io/biocontainers/edta:1.9.4--0

That'll make a new file called EDTA.sif containing everything in the EDTA v1.9.4 container.

Then we have a problem: Pawsey allows only 1 million files per user and running several EDTA runs for several genomes at once will hit that limit.

@philippbayer
philippbayer / covid_vs_cash.Rmd
Created July 2, 2020 14:26
Plotting whether there's a correlation between I've Been Everywhere and Covid-19 cases
```{r setup}
library(tidyverse)
library(ggrepel)
```
```{r}
df <- readxl::read_xlsx('./Covid_vs_State.xlsx')
head(df)
```
@philippbayer
philippbayer / deepvariant.sh
Created February 21, 2020 15:56
GPU-enabled DeepVariant via Singularity, not Docker
module load singularity
# we have to transform the docker image into a singularity thingy. Be careful to use the GPU docker image 0.9.0-gpu, not 0.9.0
# the following command doesn’t work on copyQ?? I get `mksquashfs not found`, works fine on other systems
singularity pull $MYGROUP/singularity/myRepository/deepvariant.sif docker://google/deepvariant:0.9.0-gpu
# you may have to make the above folders
Then, using the example data described here: https://github.com/google/deepvariant/blob/r0.9/docs/deepvariant-quick-start.md