Peter Kruczkiewicz peterk87

Steps to retrieve table of named serovars with O- and H-antigens

Downloaded "Antigenic Formulae Of The Salmonella Serovars 2007 9th edition" from:

https://www.pasteur.fr/ip/portal/action/WebdriveActionEvent/oid/01s-000036-089

Regex Steps

Remove page numbers:

match

JS+D3: Zoomable, pannable scatterplot with shift keypress enabled brush multiselect of data points

This JS+D3 gist creates a scatterplot with zooming and panning enabled as well as a brush for selecting or deselecting points using the iris dataset within data.tsv.

The "Get Selection" button gets the current selection of points and prints their ids to the JS console (i.e. console.log(selection);).

The "Clear Selection" button clears the current selection.

	#! /usr/bin/Rscript --vanilla

	library(getopt)

	spec <- matrix(c(
	'msa_dir_path','d',1,'character','MSA directory path (required)'
	,'msa_file_ext','e',2,'character','MSA file extension (optional; default: "aln")'
	,'out','o',2,'character','Output core SNP matrix CSV filename (optional; default: "core_distance_matrix.csv")'
	,'n_cores','c',2,'integer','Number of cores to use for computation (optional; default: 2)'
	,'dna_distance_model','m',2,'character','DNA distance model (default: "N").

	#!/usr/bin/env python

	import argparse
	import textwrap
	import os
	import sys
	import json
	import re

	/bitbucket.org dark css theme/

	body, aside {
	background: #222 !important;
	background-color: #222 !important;
	color: #bbb !important;
	}
	h1, h2, h3, h4, h5, span {
	background-color: transparent !important;
	color: #FFC963 !important;

	import argparse
	import textwrap
	import os
	import sys
	from datetime import timedelta, datetime


	# function for reading a multifasta file
	# returns a dictionary with sequence headers and nucleotide sequences
	def get_seqs_from_fasta(filepath):


	"""

	SAM-based reboot

	"""

	import sys, os, subprocess, itertools, array, datetime, socket, heapq, tempfile

	library(RColorBrewer)

	qualitative_colours <- function(n, light=FALSE) {
	# Get a specified number of qualitative colours if possible.
	# This function will default to a continuous color scheme if there are more
	# than 21 colours needed.

	# rainbow12equal <- c("#BF4D4D", "#BF864D", "#BFBF4D", "#86BF4D", "#4DBF4D", "#4DBF86", "#4DBFBF", "#4D86BF", "#4D4DBF", "#864DBF", "#BF4DBF", "#BF4D86")
	rich12equal <- c("#000040", "#000093", "#0020E9", "#0076FF", "#00B8C2", "#04E466", "#49FB25", "#E7FD09", "#FEEA02", "#FFC200", "#FF8500", "#FF3300")


	aln_snps = {}
	for aln in aln_files:
	recs = [f for f in SeqIO.parse(aln, 'fasta')]
	# strain names should be the last dash delimited element in fasta header
	strains = [rec.name.split('-')[-1] for rec in recs]
	# get a dictionary of strain names and sequences
	strain_seq = {rec.name.split('-')[-1]:''.join([nt for nt in rec.seq]) \
	for rec in recs}
	# get length of the MSA and check that all of the seq are the same length

	# This file contains a set of functions for parsing out some useful information
	# from BLAST results files saved in BLAST's tabular output format ("-outfmt 6").

	# Biopython is required for reading multifasta files and storing sequences.
	from Bio.Seq import Seq
	from Bio.SeqRecord import SeqRecord
	from Bio.Alphabet import IUPAC

	# if all of your genome sequences are within one multifasta file
	recs = [rec for rec in SeqIO.parse('all_genomes.fasta', 'fasta')]