Skip to content

Instantly share code, notes, and snippets.

View tomsing1's full-sized avatar

Thomas Sandmann tomsing1

View GitHub Profile
@tomsing1
tomsing1 / read_write_h5ad.R
Created November 5, 2021 19:12
Writing and reading h5ad files from R using the zellkonverter Bioconductor R package
libraries = c("zellkonverter", "SingleCellExperiment")
for (lib in libraries) {
suppressPackageStartupMessages(library(lib, character.only = TRUE,
quietly = TRUE))
}
# dummy SingleCellExperiment
ncells <- 100
u <- matrix(rpois(20000, 5), ncol = ncells)
v <- log2(u + 1)
@tomsing1
tomsing1 / python_and_r_via_basilisk.R
Created November 5, 2021 19:04
Using the basilisk Bioconductor package to execute python code in a controlled environment
libraries = c("basilisk", "mypackage", "zellkonverter",
"SingleCellExperiment")
for (lib in libraries) {
suppressPackageStartupMessages(library(lib, character.only = TRUE,
quietly = TRUE))
}
# create dummy SingleCellExperiment
ncells <- 100
u <- matrix(rpois(20000, 5), ncol = ncells)
@tomsing1
tomsing1 / sra_download_via_s3.sh
Last active October 1, 2021 19:55
Quickly download SRA archives using the AWS CLI and then extract the file using fasterq-dump
#!/usr/bin/env bash
set -e
set -x
set -o pipefail
# This bash script retrieves SRA archives for a runs listed in the
# SRR_Acc_List.txt file (available from the SRA Run Selector).
#
# It requires:
@tomsing1
tomsing1 / stitch_and_align.sh
Created September 23, 2021 04:04
Bash script to stitch & align paired end reads, using dockerized fastp, minimap2 and multiqc tools
#!/usr/bin/env bash
set -e
set -x
set -o pipefail
declare -r CORES=$(getconf _NPROCESSORS_ONLN)
declare -r REFERENCE="s3://your-reference-bucket/gencode/release_30/GRCh38_p12/"
declare -r S3_FASTQ="s3://your-bucket/Fastq/"
declare -r MIN_OVERLAP=10 # the larger the better
@tomsing1
tomsing1 / calculate_clr.R
Created July 7, 2021 21:20
calculate the centered log ratio (CLR) for a data matrix (for compositional data)
#' Centered log-ratio transformation
#'
#' @param m Count matrix with features in rows and samples (cells) in columns
#' @return matrix
#' @export
clr <- function(m) {
apply(m, 2, function(x) {
log1p(x = x/(exp(x = sum(log1p(x = x[x > 0]), na.rm = TRUE)/length(x = x))))
})
}
@tomsing1
tomsing1 / subsample_bam.sh
Created June 7, 2021 21:58
Shell script to sub-sample a BAM file
# Shell function to subsample to a fixed number of alignments,
# requiring the sambamba and samtools suites to be available.
# see https://www.biostars.org/p/76791/
function SubSample {
local FACTOR=$(samtools idxstats $1 | cut -f3 | \
awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')
if [[ $FACTOR > 1 ]]
then
echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
@tomsing1
tomsing1 / filter_amplicon_alignments.py
Created June 4, 2021 21:39
Python script to filter alignments based on the position of the R1 and R2 reads
"""Filter alignments based on R1 and R2 start positions
Example: python extract_alignments.py example.bam filtered.bam
"""
import argparse
import sys
from pysam import AlignmentFile
parser = argparse.ArgumentParser()
@tomsing1
tomsing1 / docker_mosh.sh
Created May 25, 2021 04:47
Use mosh to log into a running docker-machine instance
#!/usr/bin/env bash
# This script provisions an EC2 instance via docker-machine and takes the
# following positional arguments
# 1. docker-machine name (bioinfo-sandmann)
# 2. login username (ubuntu)
set -e # exit upon error
set -o nounset # no unset variables
@tomsing1
tomsing1 / seqrc_bed_file_generation.md
Created April 27, 2021 18:37
How to create BED files for use with RSEQC
  1. Download the following two BED files from the RSEQC website, hosted on SourceForge:
  • hg38_RefSeq.bed.gz: All human RefSeq transcripts
  • hg38.HouseKeepingGenes.bed.gz: Subset of human RefSeq transcripts considered housekeeping genes
  1. Extract the RefSeq identifiers from the HouseKeepingGenes file.
  2. Map the RefSeq identifiers to orthologous genes in your species of interest, e.g. using ensembl's BioMart web interface. Make sure the return the gene stable ID for the target species.
@tomsing1
tomsing1 / nextflow_tower.md
Created February 15, 2021 20:15
Setting up nextflow tower locally

Installing Nextflow on a Mac

  • Install JAVA 8
 brew install adoptopenjdk8
  • Install Nextflow: curl https://get.nextflow.io | bash in the current directory
  • Optional: Move the nextflow binary to a directory that is in the PATH.