Skip to content

Instantly share code, notes, and snippets.

View Tabea-K's full-sized avatar

Tabea Kischka Tabea-K

  • Münster, Germany
View GitHub Profile
@Tabea-K
Tabea-K / fasta_get_subseq.sh
Created January 13, 2016 12:48
Prints a subsequence of a single fasta file specified by start and end coordinates.
# get subsequence of a fasta file containing only one sequence
fasta_get_subseq () {
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < "$1" | tail -n1 | cut -c$2-$3
}
@Tabea-K
Tabea-K / longblob2list.py
Created January 18, 2016 15:26
Converts a longblob variable into a list of integers. "90930917,90931703,90932054," > [90930917,90931703,90932054]
def longblob2list(longblob):
"""
Converts a longblob variable into a list of integers.
"90930917,90931703,90932054," > [90930917,90931703,90932054]
"""
y = []
for number in longblob.split(','):
if number != "" and number != ",":
y.append(int(number))
return y
@Tabea-K
Tabea-K / refGen25primeUTRexons.sh
Last active June 1, 2016 12:45
Prints the coordinates of all 5'UTR exons from a refGenefile. Based on https://www.biostars.org/p/10907/#10910
# gives the coordinates of all 5'UTR exons
awk '
BEGIN { OFS = "\t"; FS = "\t"} ;
{
# $7 is cdsStart
delete astarts;
delete aends;
split($10, astarts, /,/);
split($11, aends, /,/);
for(i=1; i <= length(astarts); i++){
@Tabea-K
Tabea-K / liftNCBI36ToCRCh37.sh
Created March 2, 2016 15:42
uses the ensembl rest api to lift a chromosome coordinate from hg18/hg36 to hg37
curl -s 'http://rest.ensembl.org/map/human/GRCh37/X:1000000..1000100:1/NCBI36?' -H 'Content-type:application/json' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["mappings"][0]["mapped"]["end"]'
@Tabea-K
Tabea-K / dupfilesbysize.sh
Last active June 1, 2016 12:29
Prints a list of all duplicate files (based on md5 hash), sorted by file size!
for X in `fdupes . -r | xargs -n1`; do ls -alhS $X; done
@Tabea-K
Tabea-K / fasta_order_genome_karyotypically.py
Last active May 16, 2018 12:22
Orders a multi fasta genome file by chromosome name in karyotypic order. I.e., after chr1 follows chr2, and not chr10.
#!/usr/local/bin/python
# File created by Tabea Kischka at Thu May 19 15:24:59 CEST 2016
# This script orders the sequence in a multi fasta file by the chromosome name in
# karyotypic oder, and not lexicographically.
# I.e., it creates this order: chr1, chr2, chr10, ...
# instead of the order chr1, chr10, chr2
import sys
@Tabea-K
Tabea-K / change_screen_window_title.sh
Last active June 1, 2016 12:27
Changes the title of a screen window
echo -e '\033k'"Super_title_includes_directory_${PWD}"'\033\\'
@Tabea-K
Tabea-K / find_broken_symlinks.sh
Last active June 1, 2016 12:28
Prints a list of all broken symlinks in the current dir and its subdirs
@Tabea-K
Tabea-K / csv2macdown.py
Created June 1, 2016 12:23
Converts a file in csv format into markdown (rather macdown) format.
#!/usr/bin/env python
"""
Turns tab delimited data to macdown table format.
Needs the csvtoolkit to be installed.
"""
import sys
import os
import tempfile
import argparse