Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Check if input string is provided | |
if [ -z "$1" ] | |
then | |
echo "Please provide a string input"; | |
exit 1; | |
fi | |
# Tell the user i am thinking |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
system('curl https://trace.ncbi.nlm.nih.gov/Traces/sra/sra_stat.cgi > /tmp/stats.csv') | |
st <- read.table('/tmp/stats.csv', sep=',', header=T) | |
st$date <- as.Date(st$date, format='%m/%d/%Y') | |
i <- min(which(st$bases >= 0.5625e16)) | |
id1 <- i | |
id2 <- min(which(st$bases >= 1.125e16)) | |
id3 <- min(which(st$bases >= 2.25e16)) | |
id4 <- min(which(st$bases >= 4.5e16)) | |
id5 <- min(which(st$bases >= 8.95e16)) | |
plot(st$date[id1:id5], log10(st$bases[id1:id5]), type='l', xlab="Date", ylab="log10(Total SRA bases)") |
** Step 1 **
Install ffmpeg with the vidstab plugin.
- OSX: Install via Homebrew -
brew install ffmpeg --with-libvidstab
- Linux: download binaries here (vidstab included)
- Windows: download binaries here (vidstab included)
Source | Dst. file type | Protocol | Time (s) | Command Line |
---|---|---|---|---|
NCBI | .sra | ftp | 296 | wget |
NCBI | .fastq.gz | sra toolkit | ~23000 | fastq-dump -Z --gzip --split-spot |
local file | sra=>fastq.gz | sra toolkit | ~15000 | fastq-dump --gzip --split-spot --split-3 |
EBI | .fastq.gz | aspera | 513+492 | aspera -QT -l 300m |
EBI | .fastq.gz | ftp | 1876+1946 | wget |
Notes:
Introduction/tl;dr: I wrote this post as a reference for a few new graduate students in my department that are getting started with RNA-seq data analysis. It begins with an informal, big-picture overview of RNA-seq data analysis, and the general flow of the post outlines one standard RNA-seq workflow, but I wanted to give general audiences a "heads-up" that the post goes into quite a bit of nitty gritty detail that's specific to our department's computing setup.
RNA-seq is a high-throughput technology used to measure gene expression in cell populations. For a super bare-bones picture of what gene expression is, please enjoy this ASCII art I made to illustrate the process:
[DNA] ACGTAGGT{CGTATTT}AGCGT{AGCGCCCGA}TTACA
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cPickle default dumps: 0.107237100601 | |
cPickle HIGHEST_PROTOCOL dumps: 0.0678668022156 | |
marshal dumps: 0.0203359127045 | |
cPickle default loads: 0.0411729812622 | |
cPickle HIGHEST_PROTOCOL loads: 0.0352649688721 | |
marshal loads: 0.0221829414368 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdlib.h> | |
#include <stdio.h> | |
#include <stdint.h> | |
#include <fcntl.h> | |
#include <sys/stat.h> | |
#include <sys/mman.h> | |
#include <unistd.h> | |
int main(int argc, const char *argv[]) | |
{ |