Skip to content

Instantly share code, notes, and snippets.

View peterk87's full-sized avatar

Peter Kruczkiewicz peterk87

  • Canadian Food Inspection Agency
  • Canada
View GitHub Profile
@peterk87
peterk87 / chunk_seq.py
Created May 1, 2013 19:02
Python: Chunk up a nucleotide or amino acid sequence and return a list of tuples containing the chunk sequence name and the chunk sequence.
def chunk_seq(seq_name, sequence, chunk_size, chunk_increment):
"""
Chunk up a sequence and return a list of tuples with the chunked up
sequences and new sequence names with the position of the chunk in the
original sequence.
Args:
seq_name: Sequence name.
sequence: Nucleotide or amino acid sequence that is to be chunked up.
chunk_size: Size of chunks (e.g. 30 bp)
@peterk87
peterk87 / heatmap.py
Created May 2, 2013 21:41
Python: hierarchically clustered heatmap using Matplotlib
## {{{ http://code.activestate.com/recipes/578175/ (r1)
### hierarchical_clustering.py
#Copyright 2005-2012 J. David Gladstone Institutes, San Francisco California
#Author Nathan Salomonis - [email protected]
#Permission is hereby granted, free of charge, to any person obtaining a copy
#of this software and associated documentation files (the "Software"), to deal
#in the Software without restriction, including without limitation the rights
#to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
#copies of the Software, and to permit persons to whom the Software is furnished
@peterk87
peterk87 / hierarchical_clustering_num_clusters_vs_distances_plots.py
Created May 2, 2013 23:35
Python: Hierarchical clustering plot and number of clusters over distances plot
from scipy.spatial.distance import *
from scipy.cluster.hierarchy import *
import pandas as pd
import numpy
import matplotlib as plt
from matplotlib.pylab import figure
import pylab as pl
import pp
def num_clusters(hc, d):
@peterk87
peterk87 / parse_tabular_blast.py
Created May 3, 2013 19:39
Python: Parse tabular BLAST results from file into 2D dictionary ([query id/name] -> [subject id/name] -> [list of BLAST results (one or more)])
# This file contains a set of functions for parsing out some useful information
# from BLAST results files saved in BLAST's tabular output format ("-outfmt 6").
# Biopython is required for reading multifasta files and storing sequences.
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
# if all of your genome sequences are within one multifasta file
recs = [rec for rec in SeqIO.parse('all_genomes.fasta', 'fasta')]
@peterk87
peterk87 / get_snps_from_msas.py
Created May 9, 2013 16:34
Python: Get SNPs from MSAs
aln_snps = {}
for aln in aln_files:
recs = [f for f in SeqIO.parse(aln, 'fasta')]
# strain names should be the last dash delimited element in fasta header
strains = [rec.name.split('-')[-1] for rec in recs]
# get a dictionary of strain names and sequences
strain_seq = {rec.name.split('-')[-1]:''.join([nt for nt in rec.seq]) \
for rec in recs}
# get length of the MSA and check that all of the seq are the same length
@peterk87
peterk87 / get_qual_colors.R
Created July 16, 2013 18:39
R: Get qualitative colors for a vector of characters
library(RColorBrewer)
qualitative_colours <- function(n, light=FALSE) {
# Get a specified number of qualitative colours if possible.
# This function will default to a continuous color scheme if there are more
# than 21 colours needed.
# rainbow12equal <- c("#BF4D4D", "#BF864D", "#BFBF4D", "#86BF4D", "#4DBF4D", "#4DBF86", "#4DBFBF", "#4D86BF", "#4D4DBF", "#864DBF", "#BF4DBF", "#BF4D86")
rich12equal <- c("#000040", "#000093", "#0020E9", "#0076FF", "#00B8C2", "#04E466", "#49FB25", "#E7FD09", "#FEEA02", "#FFC200", "#FF8500", "#FF3300")
@peterk87
peterk87 / sam.py
Created October 24, 2013 19:41
Python: Nesoni sam.py fixed self.file is NoneType error in Bam_reader
"""
SAM-based reboot
"""
import sys, os, subprocess, itertools, array, datetime, socket, heapq, tempfile
@peterk87
peterk87 / parseSNPs.py
Created January 13, 2014 22:52
Python: Parse SNPs from one or more multiple sequence alignments in multifasta format and output a concatenated SNP fasta, a basic SNP report, and/or [binarized] SNP table.
import argparse
import textwrap
import os
import sys
from datetime import timedelta, datetime
# function for reading a multifasta file
# returns a dictionary with sequence headers and nucleotide sequences
def get_seqs_from_fasta(filepath):
@peterk87
peterk87 / README.md
Last active May 10, 2017 11:46
JS+D3: Zoomable, pannable scatterplot with shift keypress enabled brush multiselect of data points

JS+D3: Zoomable, pannable scatterplot with shift keypress enabled brush multiselect of data points

This JS+D3 gist creates a scatterplot with zooming and panning enabled as well as a brush for selecting or deselecting points using the iris dataset within data.tsv.

The "Get Selection" button gets the current selection of points and prints their ids to the JS console (i.e. console.log(selection);).

The "Clear Selection" button clears the current selection.

@peterk87
peterk87 / bitbucket_dark.css
Created January 16, 2014 21:36
CSS: bitbucket.org dark theme
/*bitbucket.org dark css theme*/
body, aside {
background: #222 !important;
background-color: #222 !important;
color: #bbb !important;
}
h1, h2, h3, h4, h5, span {
background-color: transparent !important;
color: #FFC963 !important;