Skip to content

Instantly share code, notes, and snippets.

View mmterpstra's full-sized avatar

mmterpstra

View GitHub Profile
@mmterpstra
mmterpstra / SimpleDownSample.sh
Last active July 25, 2023 13:00
Downsample a bam file
#intro
#This shows how to downsample a bam without returning to fastq state of the file...
#pros: Fast
#cons: Might have alignment artifacts/info from the bigger subpool (like better indel alignments). Worse then completely stripping alignment info and aligning the reads again.
#how to use
#create a config and edit the downsample fractions as seen below and run the samples or just remove the leading part and create one yourselfs.
#parallelisation is done in a lazy way and shoud be fixed/checked.
@mmterpstra
mmterpstra / findnotfinished.py
Last active April 28, 2021 19:41
Molgenis compute 5 script for finding failed jobs.
#!/usr/bin/env python3
import sys, os, time, subprocess
#use findnotfinished.py /path/to/jobsdir
try:
input_dir = sys.argv[1];
except:
print('Die cannot open dir')
exit
@mmterpstra
mmterpstra / submitd.pl
Last active July 30, 2020 01:08
submitd.pl queue crawlin since 2019
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
use Proc::PID::File;
use Getopt::Long;
use Log::Log4perl qw(:easy);
use File::Basename;
@mmterpstra
mmterpstra / README.md
Last active August 2, 2019 14:55
ToTheSkies

Bee Pic

@mmterpstra
mmterpstra / WipeSamSampleNames.pl
Last active April 2, 2020 10:36
Tool that tries to remove samplenames from bam/sam samplenames are extracted from SM field in header using picard and samtools
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#Needs samtools binary/picard jarfiles hardcoded in script
my $use = <<"END1";
Use $0 PICARDJAR FILE(s)
Tries to remove samplenames from bam/sam samplenames are extracted from SM field in header
PICARDJAR Needs picard jarfile path in script
FILE Needs *.bam
@mmterpstra
mmterpstra / README.md
Last active April 21, 2017 14:41
trimbybed comparison

Intro

This should show if there is an added yield in trimming nugene data by bedfile containing the landing probes. This applies tree different workflows for comparision as listed below.

Compared workflows

  • Trimbybed
    • bbduk trim linker(hisat has diffculties with unaligned ends)
    • hisat align
    • trimbybed
@mmterpstra
mmterpstra / FindPolyNucSitesRef.pl
Created March 9, 2017 13:19
Finds repeating basepair units in DNA use: FindPolyNucSitesRef.pl <[ATCGN]+> <REPEATLENGTH> <FASTA> | bedtools merge -i - > result.bed
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Scalar::Util qw/ looks_like_number/ ;
my $use = <<"END1";
use
$0 NUC LEN FASTA>BED
Traverses the FASTA file and returns a BED file with the locations of the NUCleotide sequences spanning a length greather then or equal to LEN.
@mmterpstra
mmterpstra / downsample.sh
Last active January 18, 2017 14:05
Downsample Fastq data
set -e
set -x
set -o pipefail
TMP00=./results/tmp/
RAWDIR=/path/to/fastq/files
ml picard/2.2.2-foss-2016a-Java-1.8.0_74
(for fastq1 in ${RAWDIR}/*.fastq.gz; do
echo "## INFO ##"$(date)" ## file '"$fastq1"' start"
http://www.howtogeek.com/116032/how-to-encrypt-your-home-folder-after-installing-ubuntu/
@mmterpstra
mmterpstra / IntersectGFF.pl
Last active November 24, 2016 15:00
intersect gtf/gff files
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $use = <<"END";
Simple extractor of the annotations between pre-defined regions in the gff format
input
perl $0 regions.gff ~/Downloads/Homo_sapiens.GRCh37.75.gtf > ~/regionsIntersectHomo_sapiens.GRCh37.75.gff
no validation of input....