Rayan Chikhi rchikhi

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

https://www.genomicsplc.com

Video stabilization using VidStab and FFMPEG

** Step 1 **

Install ffmpeg with the vidstab plugin.

OSX: Install via Homebrew - brew install ffmpeg --with-libvidstab
Linux: download binaries here (vidstab included)
Windows: download binaries here (vidstab included)

Source	Dst. file type	Protocol	Time (s)	Command Line
NCBI	.sra	ftp	296	wget
NCBI	.fastq.gz	sra toolkit	~23000	fastq-dump -Z --gzip --split-spot
local file	sra=>fastq.gz	sra toolkit	~15000	fastq-dump --gzip --split-spot --split-3
EBI	.fastq.gz	aspera	513+492	aspera -QT -l 300m
EBI	.fastq.gz	ftp	1876+1946	wget

Notes:

Analyzing RNA-seq data with the "Tuxedo" tools

Introduction/tl;dr: I wrote this post as a reference for a few new graduate students in my department that are getting started with RNA-seq data analysis. It begins with an informal, big-picture overview of RNA-seq data analysis, and the general flow of the post outlines one standard RNA-seq workflow, but I wanted to give general audiences a "heads-up" that the post goes into quite a bit of nitty gritty detail that's specific to our department's computing setup.

preliminaries: what's RNA-seq?

RNA-seq is a high-throughput technology used to measure gene expression in cell populations. For a super bare-bones picture of what gene expression is, please enjoy this ASCII art I made to illustrate the process:

[DNA]            ACGTAGGT{CGTATTT}AGCGT{AGCGCCCGA}TTACA
                                    |

	#!/bin/bash

	# Check if input string is provided
	if [ -z "$1" ]
	then
	echo "Please provide a string input";
	exit 1;
	fi

	# Tell the user i am thinking

	system('curl https://trace.ncbi.nlm.nih.gov/Traces/sra/sra_stat.cgi > /tmp/stats.csv')
	st <- read.table('/tmp/stats.csv', sep=',', header=T)
	st$date <- as.Date(st$date, format='%m/%d/%Y')
	i <- min(which(st$bases >= 0.5625e16))
	id1 <- i
	id2 <- min(which(st$bases >= 1.125e16))
	id3 <- min(which(st$bases >= 2.25e16))
	id4 <- min(which(st$bases >= 4.5e16))
	id5 <- min(which(st$bases >= 8.95e16))
	plot(st$date[id1:id5], log10(st$bases[id1:id5]), type='l', xlab="Date", ylab="log10(Total SRA bases)")

	cPickle default dumps: 0.107237100601
	cPickle HIGHEST_PROTOCOL dumps: 0.0678668022156
	marshal dumps: 0.0203359127045
	cPickle default loads: 0.0411729812622
	cPickle HIGHEST_PROTOCOL loads: 0.0352649688721
	marshal loads: 0.0221829414368

	#include <stdlib.h>
	#include <stdio.h>
	#include <stdint.h>
	#include <fcntl.h>
	#include <sys/stat.h>
	#include <sys/mman.h>
	#include <unistd.h>

	int main(int argc, const char *argv[])
	{