A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
SPC s c remove highlight | |
**** Files manipulations key bindings | |
Files manipulation commands (start with ~f~): | |
| Key Binding | Description | | |
|-------------+----------------------------------------------------------------| | |
| ~SPC f c~ | copy current file to a different location | | |
| ~SPC f C d~ | convert file from unix to dos encoding | | |
| ~SPC f C u~ | convert file from dos to unix encoding | |
# Variation of information (VI) | |
# | |
# Meila, M. (2007). Comparing clusterings-an information | |
# based distance. Journal of Multivariate Analysis, 98, | |
# 873-895. doi:10.1016/j.jmva.2006.11.013 | |
# | |
# https://en.wikipedia.org/wiki/Variation_of_information | |
from math import log |
# Prior to the tutorial make sure that the script below runs without error on your R installation. | |
# What you need is a working installation of Stan: http://mc-stan.org/ . | |
# For installation instructions, see here: | |
# https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started | |
# After installation you should be able to run this script which should output | |
# some summary statistics and some pretty plots, :) | |
# Generating some fake data | |
set.seed(123) |
import sys | |
########################################################## | |
#requires snakemake, python3, pyfasta to be installed | |
#save this file and provide all the binaries and their path | |
#in variables below. | |
#to run flux pipeline: | |
#snakemake run_flux_pipeline | |
#to run rsem pipeline: | |
#snakemake run_rsem_pipeline |
## RNA-seq analysis with DESeq2 | |
## Stephen Turner, @genetics_blog | |
# RNA-seq data from GSE52202 | |
# http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse52202. All patients with | |
# ALS, 4 with C9 expansion ("exp"), 4 controls without expansion ("ctl") | |
# Import & pre-process ---------------------------------------------------- | |
# Import data from featureCounts |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
SAM and BAM filtering one-liners
@author: David Fredman, [email protected] (sans poly-A tail)
@dependencies: http://sourceforge.net/projects/bamtools/ and http://samtools.sourceforge.net/
Please extend with additional/faster/better solutions via a pull request!
BWA mapping (using piping for minimal disk I/O)
#!/bin/bash | |
# Usage: interleave_fastq.sh f.fastq r.fastq > interleaved.fastq | |
# | |
# Interleaves the reads of two FASTQ files specified on the | |
# command line and outputs a single FASTQ file of STDOUT. | |
# | |
# Can interleave 100 million paired reads (200 million total | |
# reads; a 2 x 22Gbyte files), in memory (/dev/shm), in 6m54s (414s) | |
# | |
# Latest code: https://gist.github.com/4544979 |
# the following two lines give a two-line status, with the current window highlighted | |
hardstatus alwayslastline | |
hardstatus string '%{= kG}[%{G}%H%? %1`%?%{g}][%= %{= kw}%-w%{+b yk} %n*%t%?(%u)%? %{-}%+w %=%{g}][%{B}%m/%d %{W}%C%A%{g}]' | |
# huge scrollback buffer | |
defscrollback 5000 | |
# no welcome message | |
startup_message off |