Skip to content

Instantly share code, notes, and snippets.

View peterk87's full-sized avatar

Peter Kruczkiewicz peterk87

  • Canadian Food Inspection Agency
  • Canada
View GitHub Profile
@peterk87
peterk87 / get_core_SNP_matrix_from_MSAs_parallel.R
Created March 27, 2014 20:46
R: Get SNP distance matrix from multiple MSAs
#! /usr/bin/Rscript --vanilla
library(getopt)
spec <- matrix(c(
'msa_dir_path','d',1,'character','MSA directory path (required)'
,'msa_file_ext','e',2,'character','MSA file extension (optional; default: "aln")'
,'out','o',2,'character','Output core SNP matrix CSV filename (optional; default: "core_distance_matrix.csv")'
,'n_cores','c',2,'integer','Number of cores to use for computation (optional; default: 2)'
,'dna_distance_model','m',2,'character','DNA distance model (default: "N").
@peterk87
peterk87 / README.md
Last active August 29, 2015 13:57
TSV: Salmonella enterica named serovar O- and H-antigens cross reference table
@peterk87
peterk87 / mist_json_parse.py
Last active August 29, 2015 13:56
Python: MIST JSON to CSV parser
#!/usr/bin/env python
import argparse
import textwrap
import os
import sys
import json
import re
@peterk87
peterk87 / bitbucket_dark.css
Created January 16, 2014 21:36
CSS: bitbucket.org dark theme
/*bitbucket.org dark css theme*/
body, aside {
background: #222 !important;
background-color: #222 !important;
color: #bbb !important;
}
h1, h2, h3, h4, h5, span {
background-color: transparent !important;
color: #FFC963 !important;
@peterk87
peterk87 / README.md
Last active May 10, 2017 11:46
JS+D3: Zoomable, pannable scatterplot with shift keypress enabled brush multiselect of data points

JS+D3: Zoomable, pannable scatterplot with shift keypress enabled brush multiselect of data points

This JS+D3 gist creates a scatterplot with zooming and panning enabled as well as a brush for selecting or deselecting points using the iris dataset within data.tsv.

The "Get Selection" button gets the current selection of points and prints their ids to the JS console (i.e. console.log(selection);).

The "Clear Selection" button clears the current selection.

@peterk87
peterk87 / parseSNPs.py
Created January 13, 2014 22:52
Python: Parse SNPs from one or more multiple sequence alignments in multifasta format and output a concatenated SNP fasta, a basic SNP report, and/or [binarized] SNP table.
import argparse
import textwrap
import os
import sys
from datetime import timedelta, datetime
# function for reading a multifasta file
# returns a dictionary with sequence headers and nucleotide sequences
def get_seqs_from_fasta(filepath):
@peterk87
peterk87 / sam.py
Created October 24, 2013 19:41
Python: Nesoni sam.py fixed self.file is NoneType error in Bam_reader
"""
SAM-based reboot
"""
import sys, os, subprocess, itertools, array, datetime, socket, heapq, tempfile
@peterk87
peterk87 / get_qual_colors.R
Created July 16, 2013 18:39
R: Get qualitative colors for a vector of characters
library(RColorBrewer)
qualitative_colours <- function(n, light=FALSE) {
# Get a specified number of qualitative colours if possible.
# This function will default to a continuous color scheme if there are more
# than 21 colours needed.
# rainbow12equal <- c("#BF4D4D", "#BF864D", "#BFBF4D", "#86BF4D", "#4DBF4D", "#4DBF86", "#4DBFBF", "#4D86BF", "#4D4DBF", "#864DBF", "#BF4DBF", "#BF4D86")
rich12equal <- c("#000040", "#000093", "#0020E9", "#0076FF", "#00B8C2", "#04E466", "#49FB25", "#E7FD09", "#FEEA02", "#FFC200", "#FF8500", "#FF3300")
@peterk87
peterk87 / get_snps_from_msas.py
Created May 9, 2013 16:34
Python: Get SNPs from MSAs
aln_snps = {}
for aln in aln_files:
recs = [f for f in SeqIO.parse(aln, 'fasta')]
# strain names should be the last dash delimited element in fasta header
strains = [rec.name.split('-')[-1] for rec in recs]
# get a dictionary of strain names and sequences
strain_seq = {rec.name.split('-')[-1]:''.join([nt for nt in rec.seq]) \
for rec in recs}
# get length of the MSA and check that all of the seq are the same length
@peterk87
peterk87 / parse_tabular_blast.py
Created May 3, 2013 19:39
Python: Parse tabular BLAST results from file into 2D dictionary ([query id/name] -> [subject id/name] -> [list of BLAST results (one or more)])
# This file contains a set of functions for parsing out some useful information
# from BLAST results files saved in BLAST's tabular output format ("-outfmt 6").
# Biopython is required for reading multifasta files and storing sequences.
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
# if all of your genome sequences are within one multifasta file
recs = [rec for rec in SeqIO.parse('all_genomes.fasta', 'fasta')]