Skip to content

Instantly share code, notes, and snippets.

View jrjhealey's full-sized avatar

Joe Healey jrjhealey

View GitHub Profile
@jrjhealey
jrjhealey / fastafetcher.py
Created September 13, 2017 21:47
Pull out fastas from a multifasta based on keyword search as a string or keyfile
# Extract fasta files by their descriptors stored in a separate file.
# Requires biopython
from Bio import SeqIO
import sys
import argparse
def getKeys(args):
"""Turns the input key file into a list. May be memory intensive."""
@jrjhealey
jrjhealey / tabulateHHpred.py
Created July 5, 2017 21:06
Turn the output of HHsearch in to tab delimited text
# -*- coding: utf-8 -*-
"""
This script takes the .hhr files output by HHSuite and
turns the quite verbose file in to a fully tabulated
version with all the fields separated one, one line per
file. Thus, the file can be viewed simply in Excel etc.
It requires the non-standard pandas module.
"""
@jrjhealey
jrjhealey / Genbank_slicer.py
Last active September 20, 2017 19:11
Creating subsetted operons/gene genbank files from a 'parent' sequence!
#!/usr/bin/python
# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file. This can be
# done explicitly by telling the script which base sites to use, or can
# 'decide' for itself by blasting a fasta of the sequence you're inter-
# ed in against the Genbank you want to slice a record out of.
# Note, the script (obviously) does not preseve the index number of the
# bases from the original
# This script will calculate Shannon entropy from a MSA.
# Dependencies:
# Biopython, Matplotlib [optionally], Math
"""
Shannon's entropy equation (latex format):
H=-\sum_{i=1}^{M} P_i\,log_2\,P_i
Entropy is a measure of the uncertainty of a probability distribution (p1, ..... , pM)