A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
| SPC s c remove highlight | |
| **** Files manipulations key bindings | |
| Files manipulation commands (start with ~f~): | |
| | Key Binding | Description | | |
| |-------------+----------------------------------------------------------------| | |
| | ~SPC f c~ | copy current file to a different location | | |
| | ~SPC f C d~ | convert file from unix to dos encoding | | |
| | ~SPC f C u~ | convert file from dos to unix encoding | |
| # Variation of information (VI) | |
| # | |
| # Meila, M. (2007). Comparing clusterings-an information | |
| # based distance. Journal of Multivariate Analysis, 98, | |
| # 873-895. doi:10.1016/j.jmva.2006.11.013 | |
| # | |
| # https://en.wikipedia.org/wiki/Variation_of_information | |
| from math import log |
| # Prior to the tutorial make sure that the script below runs without error on your R installation. | |
| # What you need is a working installation of Stan: http://mc-stan.org/ . | |
| # For installation instructions, see here: | |
| # https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started | |
| # After installation you should be able to run this script which should output | |
| # some summary statistics and some pretty plots, :) | |
| # Generating some fake data | |
| set.seed(123) |
| import sys | |
| ########################################################## | |
| #requires snakemake, python3, pyfasta to be installed | |
| #save this file and provide all the binaries and their path | |
| #in variables below. | |
| #to run flux pipeline: | |
| #snakemake run_flux_pipeline | |
| #to run rsem pipeline: | |
| #snakemake run_rsem_pipeline |
| ## RNA-seq analysis with DESeq2 | |
| ## Stephen Turner, @genetics_blog | |
| # RNA-seq data from GSE52202 | |
| # http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse52202. All patients with | |
| # ALS, 4 with C9 expansion ("exp"), 4 controls without expansion ("ctl") | |
| # Import & pre-process ---------------------------------------------------- | |
| # Import data from featureCounts |
A personal diary of DataFrame munging over the years.
Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)
SAM and BAM filtering one-liners
@author: David Fredman, david.fredmanAAAAAA@gmail.com (sans poly-A tail)
@dependencies: http://sourceforge.net/projects/bamtools/ and http://samtools.sourceforge.net/
Please extend with additional/faster/better solutions via a pull request!
BWA mapping (using piping for minimal disk I/O)
| #!/bin/bash | |
| # Usage: interleave_fastq.sh f.fastq r.fastq > interleaved.fastq | |
| # | |
| # Interleaves the reads of two FASTQ files specified on the | |
| # command line and outputs a single FASTQ file of STDOUT. | |
| # | |
| # Can interleave 100 million paired reads (200 million total | |
| # reads; a 2 x 22Gbyte files), in memory (/dev/shm), in 6m54s (414s) | |
| # | |
| # Latest code: https://gist.github.com/4544979 |
| # the following two lines give a two-line status, with the current window highlighted | |
| hardstatus alwayslastline | |
| hardstatus string '%{= kG}[%{G}%H%? %1`%?%{g}][%= %{= kw}%-w%{+b yk} %n*%t%?(%u)%? %{-}%+w %=%{g}][%{B}%m/%d %{W}%C%A%{g}]' | |
| # huge scrollback buffer | |
| defscrollback 5000 | |
| # no welcome message | |
| startup_message off |