Skip to content

Instantly share code, notes, and snippets.

View marcelm's full-sized avatar

Marcel Martin marcelm

  • Stockholm
  • 02:15 (UTC +01:00)
View GitHub Profile
@marcelm
marcelm / star-segfault.fasta
Created September 23, 2014 16:54
trying to map this to Danio rerio Zv9 with STAR leads to a segfault
>segfault
AACTGGAGCAGGTCCTGGTATTCTAACATTAATGTCTCCAGATGAACGCCCATAAACTTGGCTTTCACCTCAAACACTCC
CACAGTCTCTGATGGCGAGATCTCGAAAATTGCATTCTTATACTGATTTGTCTGCAGATCATCGATGTCAATAAGCACTC
CCTTCTCATGCAGACGGGCTGCTGTGTACTTCTGTGATACCTGCTTGCTCTTCTTTGCCTGTTTATCTCCAGGTTTCTTA
GAAACTTTTCCCTTGCTCCCTAAATTGTCCATGCAGGTCTTGATGTACTGATTGTAGTAGTCAATCTGCTCGTTGTAGAA
GGTTGCGGATGTCCTTGGCGATGTCATTGATTAGGTCCTGGTATTTCTTCTCAGGGTGTACTTTGCCCAACTCTCCCAGT
CTTTGCAGGTTGCTCTTGATCTTGTCCTTCTTGCCTTGCAAAGTAAGACTGTCATCTACAACTGGTTTGGCTTGCTTCAT
CTTCTCTGGGGTCTTTGCATCGCGAATGGCTCGCCGCTGCATAGCTCGCTGGTACTCTGCTTCCTGCTCAGGTGAAGCTA
CAGTCTCCAGAATCTCAGTAAGTGTGTCTCCTGGCTGAAATCTGATCACATCTACTATGAGCCTCTTGGTGTTAAGGAGA
AGAGTTTTGGCATCCATTTCAGCATTGGCCTCATCGGGAACATCAAATTTGTTGGTGAGGGTGAGGGAGACTTCTGTCTT
@marcelm
marcelm / info-file-with-qualities.patch
Created April 16, 2015 14:58
hack to print out qualities in cutadapt’s info file (will crash when trimming FASTA files)
diff --git i/cutadapt/scripts/cutadapt.py w/cutadapt/scripts/cutadapt.py
index 855721d..2eaf435 100755
--- i/cutadapt/scripts/cutadapt.py
+++ w/cutadapt/scripts/cutadapt.py
@@ -155,6 +155,7 @@ class AdapterCutter(object):
# TODO write only one line, even for multiple matches
for match in matches:
seq = match.read.sequence
+ qualities = match.read.qualities
if match is None:
@marcelm
marcelm / snakemake-pure-python.py
Last active November 29, 2023 00:45
pure Python module that uses snakemake to construct and run a workflow
#!/usr/bin/env python3
"""
Running this script is (intended to be) equivalent to running the following Snakefile:
include: "pipeline.conf" # Should be an empty file
shell.prefix("set -euo pipefail;")
rule all:
input:
@marcelm
marcelm / pdfpages_oo.py
Created June 24, 2015 14:23
Plot multiple figures into a single PDF with matplotlib, using the object-oriented interface
"""
Plot multiple figures into a single PDF with matplotlib, using the
object-oriented interface.
"""
from matplotlib.backends.backend_pdf import FigureCanvasPdf, PdfPages
from matplotlib.figure import Figure
import numpy as np
with PdfPages('multi.pdf') as pages:
for i in range(10):
@marcelm
marcelm / mismatches.py
Created September 16, 2015 09:06
Use pysam and pyfaidx to find mismatches in an interval
from pysam import AlignmentFile
from pyfaidx import Fasta
def has_mismatch_in_interval(reference, bamfile, chrom, start, end):
"""
Return whether there is a mismatch in the interval (start, end) in any read mapping to the given chromosome.
reference -- a pyfaidx.Fasta object or something that behaves similarly
"""
for column in bamfile.pileup(chrom, start, end):
@marcelm
marcelm / bambai
Created December 8, 2015 13:20
Index a BAM file while sorting it
#!/bin/bash
set -euo pipefail
if [ $# -ne 1 -o x$1 == x-h -o x$1 == x--help ]; then
echo \
"Usage:
samtools sort -O bam -T prefix ... | bambai BAMPATH
Read a sorted BAM file from standard input, write it to BAMPATH and
index it at the same time (creating BAMPATH.bai)."
#!/usr/bin/env python3
"""
Mask low-quality bases in a FASTQ file with 'N'.
Adjust cutoff_front and cutoff_back below to use
different thresholds (currently: 20 at 5' end,
0 at 3' end).
Usage:
python3 qualmask.py input.fastq.gz > output.fastq
@marcelm
marcelm / kill-zombie.sh
Created October 4, 2018 09:14
Hanging Nextflow job workaround
#!/bin/bash
# A workaround for an issue with Nextflow (which may actually be a bash bug),
# see <https://github.com/SciLifeLab/Sarek/issues/420>
#
# The problem is that Nextflow does not notice that a job has finished and
# hangs indefinitely.
#
# This script looks for zombie processes that are children of a script named
# .command.stub, and kills that script. This seems to let the pipeline continue
@marcelm
marcelm / condalock.sh
Created September 17, 2019 09:32
Create a Conda environment.lock.yml for macOS while running on Linux
#!/bin/bash
# This script creates both
# - environment.osx.lock.yml and
# - environment.linux.lock.yml
# regardless of the operating system it is running on. The trick is
# temporarily setting the subdir and subdirs keys in .condarc to
# what would be appropriate for the other operating system.
#
# It assumes that there exists a (manually managed) environment.yml file
@marcelm
marcelm / fasta2fastq
Last active January 29, 2021 14:11
FASTA to FASTQ conversion
#!/usr/bin/env python3
"""
Run with:
fasta2fastq < in.fasta > out.fastq
"""
import dnaio
import sys
with dnaio.open(sys.stdin.buffer) as inf:
with dnaio.open(sys.stdout.buffer, mode="w", fileformat="fastq") as outf:
for record in inf: