Skip to content

Instantly share code, notes, and snippets.

View Tabea-K's full-sized avatar

Tabea Kischka Tabea-K

  • Münster, Germany
View GitHub Profile
@Tabea-K
Tabea-K / dupfilesbysize.sh
Last active June 1, 2016 12:29
Prints a list of all duplicate files (based on md5 hash), sorted by file size!
for X in `fdupes . -r | xargs -n1`; do ls -alhS $X; done
@Tabea-K
Tabea-K / liftNCBI36ToCRCh37.sh
Created March 2, 2016 15:42
uses the ensembl rest api to lift a chromosome coordinate from hg18/hg36 to hg37
curl -s 'http://rest.ensembl.org/map/human/GRCh37/X:1000000..1000100:1/NCBI36?' -H 'Content-type:application/json' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["mappings"][0]["mapped"]["end"]'
@Tabea-K
Tabea-K / refGen25primeUTRexons.sh
Last active June 1, 2016 12:45
Prints the coordinates of all 5'UTR exons from a refGenefile. Based on https://www.biostars.org/p/10907/#10910
# gives the coordinates of all 5'UTR exons
awk '
BEGIN { OFS = "\t"; FS = "\t"} ;
{
# $7 is cdsStart
delete astarts;
delete aends;
split($10, astarts, /,/);
split($11, aends, /,/);
for(i=1; i <= length(astarts); i++){
@Tabea-K
Tabea-K / longblob2list.py
Created January 18, 2016 15:26
Converts a longblob variable into a list of integers. "90930917,90931703,90932054," > [90930917,90931703,90932054]
def longblob2list(longblob):
"""
Converts a longblob variable into a list of integers.
"90930917,90931703,90932054," > [90930917,90931703,90932054]
"""
y = []
for number in longblob.split(','):
if number != "" and number != ",":
y.append(int(number))
return y
@Tabea-K
Tabea-K / fasta_get_subseq.sh
Created January 13, 2016 12:48
Prints a subsequence of a single fasta file specified by start and end coordinates.
# get subsequence of a fasta file containing only one sequence
fasta_get_subseq () {
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < "$1" | tail -n1 | cut -c$2-$3
}
@Tabea-K
Tabea-K / fasta2fastq.sh
Last active June 1, 2016 12:34
convert fasta to fastq, using a single character for quality ("I"). This is useful for creating dummy files to use for debugging.
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' | awk 'BEGIN {RS = ">" ; FS = "\n"} NR > 1 {print "@"$1"\n"$2"\n+"$1"\n"gensub(/./, "I", "g", $2)}'
@Tabea-K
Tabea-K / nr_of_common_lines_per_column.sh
Last active July 9, 2020 08:17
Prints the number of identical rows between different columns for two csv files. The first argument is the column number which should be used. For example, you can compare the IDs given in a csv file. Mainly a wrapper around the comm command.
#!/usr/bin/env bash
# Prints the number of identical rows between different columns for two
# csv files. The first argument is the column number which should be used.
# For example, you can compare the IDs given in a csv file.
cut -f $1 $2 | sort > .file1
cut -f $1 $3 | sort > .file2
# With no options, comm produces three-column output.
@Tabea-K
Tabea-K / maf2fasta.py
Created August 27, 2015 04:25
Converts a maf alignment file into fasta file.
#!/usr/bin/env python
"""
Created by Tabea Kischka at 2015-01-27 10:34:05
converts a last maf alignment into a fasta alignment
"""
import sys
path_to_alignio = '/home/tabeah/Scripts/alignio-maf'
@Tabea-K
Tabea-K / draw_aln.py
Created August 27, 2015 04:24
Uses TeX to create a pdf file with the visualization of an alignment.
#!/usr/bin/env python
"""
Creates a pdf alignment
"""
import sys
import os
import tempfile
input_filename = sys.argv[1]