chapmanb · December 5, 2015 18:14
diff --git a/lorena-mirqc.md b/lorena-mirqc.md
diff --git a/lorena-mirqc.patch b/lorena-mirqc.patch
 --- lorena-mirqc.md.orig	2015-12-05 11:29:26.456290998 -0500
 +++ lorena-mirqc.md	2015-12-05 13:11:01.364354674 -0500
 @@ -1,6 +1,15 @@
 small RNA-seq with bcbio-nextgen
 =============================
 
 +It would be good to have a few short sentences of introduction here to orient
 +readers and give them an overview of the whole post.
 +
 +- Why is small RNA-seq analysis important
 +- What types of analysis do you provide (this is in the pipeline section below),
 +  in a sentence.
 +- What is bcbio
 +- What will you show? Validation of the pipeline and demonstration of its capabilities.
 +
 [reproducible code](http://seqcluster.readthedocs.org/example_pipeline.html)
 
 [R code](https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.rmd)
 @@ -11,15 +20,13 @@
 --------------------
 
 We used samples from mirRQC project
 -[paper](http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3014.html). The
 -idea behind this project is to provide samples that we know how much relative
 -small RNAs they contain one respect another,  or/and even the presence or
 -absence of specific molecules, such as miRNAs. The main goal was to test
 -different platforms for miRNA detection, but I think these are great samples for
 -benchmarking any tools as well. I am using them to test how [bcbio-nextgen]()
 +[paper](http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3014.html).
 +This project provides samples with known, relative amounts of small RNAs, enabling
 +comparison of quantitation and detection of miRNAs. The main goal was to test
 +different platforms for miRNA detection, but these are also great samples for
 +benchmarking tools. I am using them to test how [bcbio-nextgen]()
 works with small RNA data.
 
 -
 Quoted from the paper, samples are (see below figure, lower panel):
 
 > Universal Human miRNA reference RNA (Agilent Technologies, #750700), human
 @@ -40,7 +47,6 @@
 You can read more about bcbio [here](http://github.com/chapmanb/bcbio-nextgen).
 There are 4 main steps in the small RNA-seq pipeline:
 
 -
 * adapter removal
 * miRNA quantification
 * other small RNAs detection and quantification
 @@ -64,10 +70,10 @@
 
 ### other smallRNAs quantification
 `bcbio` uses [seqcluster](http://github.com/lpantano/seqcluster) to detect
 -unique units of transcription over the genome with the main advantage being the
 -possibility to detect small RNAs that may come from different places. Normally
 +unique units of transcription over the genome, allowing resolutions of small
 +RNAs found in multiple genomic locations. Normally
 these small RNAs are dropped because they map multiple times on the genome and
 -require an special analysis to avoid bias in the quantification. Read more about
 +require special analysis to avoid bias in the quantification. Read more about
 why
 [other small RNAs are important](http://seqcluster.readthedocs.org/literature.html).
 
 @@ -75,11 +81,12 @@
 ### quality control metrics
 `bcbio` summarizes `Fastqc` metrics for each sample. Together with different
 metrics from the previous steps, the user has an idea of the quality of the
 -samples and the overall project. It includes the `fastqc` results, size
 +samples and the overall project. It includes `fastqc` results, size
 distribution after adapter removal and amount of small RNAs mapped to miRNAs,
 tRNA, rRNA, repeats among others. Other metrics like amount of data used until
 the end of the analysis, or warning flags if the data is noisy are provided by
 -`seqcluster` and included in the final _Rmd_ template report.
 +`seqcluster` and included in the final _Rmd_ template report. (TODO: Provide a
 +description and pointer to what Rmarkdown is)
 
 
 ### automatic report
 @@ -90,10 +97,8 @@
 
 
 ## Results
 -The advantage to use these samples is that we can measure the accuracy in the
 -quantification of the tools used inside `bcbio` and the detection of specific
 -miRs.
 -
 +The mirRQC samples allow us to measure quantitation and detection
 +accuracy of specific miRs for the tools integrated in `bcbio`.
 
 ### size distribution
 The size distribution shows easily the quality of your data. In a normal small
 @@ -144,11 +149,11 @@
 
 miRNAs which B > A are 181 and 174 follows B > C > D
 
 -That is more than 95% of accuracy for miRs with more than 5 counts. 
 +That is more than 95% of accuracy for miRs with more than 5 counts.
 
 ### specificity
 
 -To detect specificity we used samples that included specific miRNAs that are not
 +To evaluate specificity we used samples that included specific miRNAs that are not
 normally expressed there. These samples were analyzed in a different
 [run](https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/non_mirqc_bcbio.csv).
 
 @@ -184,6 +189,11 @@
 quality control|0:01|8|20
 report | 0| 1| 1
 
 +## Conclusion
 +
 +It would be nice to have a conclusion here, similar to the intro -- re-emphasize
 +the major points and suggest additional work.
 +
 # Thanks
 * [Harvard T.H. Chan School of Public Health](http://bioinformatics.sph.harvard.edu)
   for supporting the integration of small RNAseq pipeline in
Total	3:19	total cores	total memory GB
organize samples	0	1	1
trimming & miRNA	0:21	8	20
prepare	0:01	1	8
alignment	0:07	6	42.1
cluster	2:49	1	8
quality control	0:01	8	20
report	0	1	1
	--- lorena-mirqc.md.orig 2015-12-05 11:29:26.456290998 -0500
	+++ lorena-mirqc.md 2015-12-05 13:11:01.364354674 -0500
	@@ -1,6 +1,15 @@
	small RNA-seq with bcbio-nextgen
	=============================

	+It would be good to have a few short sentences of introduction here to orient
	+readers and give them an overview of the whole post.
	+
	+- Why is small RNA-seq analysis important
	+- What types of analysis do you provide (this is in the pipeline section below),
	+ in a sentence.
	+- What is bcbio
	+- What will you show? Validation of the pipeline and demonstration of its capabilities.
	+
	[reproducible code](http://seqcluster.readthedocs.org/example_pipeline.html)

	[R code](https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.rmd)
	@@ -11,15 +20,13 @@
	--------------------

	We used samples from mirRQC project
	-[paper](http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3014.html). The
	-idea behind this project is to provide samples that we know how much relative
	-small RNAs they contain one respect another, or/and even the presence or
	-absence of specific molecules, such as miRNAs. The main goal was to test
	-different platforms for miRNA detection, but I think these are great samples for
	-benchmarking any tools as well. I am using them to test how [bcbio-nextgen]()
	+[paper](http://www.nature.com/nmeth/journal/v11/n8/full/nmeth.3014.html).
	+This project provides samples with known, relative amounts of small RNAs, enabling
	+comparison of quantitation and detection of miRNAs. The main goal was to test
	+different platforms for miRNA detection, but these are also great samples for
	+benchmarking tools. I am using them to test how [bcbio-nextgen]()
	works with small RNA data.

	-
	Quoted from the paper, samples are (see below figure, lower panel):

	> Universal Human miRNA reference RNA (Agilent Technologies, #750700), human
	@@ -40,7 +47,6 @@
	You can read more about bcbio [here](http://github.com/chapmanb/bcbio-nextgen).
	There are 4 main steps in the small RNA-seq pipeline:

	-
	* adapter removal
	* miRNA quantification
	* other small RNAs detection and quantification
	@@ -64,10 +70,10 @@

	### other smallRNAs quantification
	`bcbio` uses [seqcluster](http://github.com/lpantano/seqcluster) to detect
	-unique units of transcription over the genome with the main advantage being the
	-possibility to detect small RNAs that may come from different places. Normally
	+unique units of transcription over the genome, allowing resolutions of small
	+RNAs found in multiple genomic locations. Normally
	these small RNAs are dropped because they map multiple times on the genome and
	-require an special analysis to avoid bias in the quantification. Read more about
	+require special analysis to avoid bias in the quantification. Read more about
	why
	[other small RNAs are important](http://seqcluster.readthedocs.org/literature.html).

	@@ -75,11 +81,12 @@
	### quality control metrics
	`bcbio` summarizes `Fastqc` metrics for each sample. Together with different
	metrics from the previous steps, the user has an idea of the quality of the
	-samples and the overall project. It includes the `fastqc` results, size
	+samples and the overall project. It includes `fastqc` results, size
	distribution after adapter removal and amount of small RNAs mapped to miRNAs,
	tRNA, rRNA, repeats among others. Other metrics like amount of data used until
	the end of the analysis, or warning flags if the data is noisy are provided by
	-`seqcluster` and included in the final _Rmd_ template report.
	+`seqcluster` and included in the final _Rmd_ template report. (TODO: Provide a
	+description and pointer to what Rmarkdown is)


	### automatic report
	@@ -90,10 +97,8 @@


	## Results
	-The advantage to use these samples is that we can measure the accuracy in the
	-quantification of the tools used inside `bcbio` and the detection of specific
	-miRs.
	-
	+The mirRQC samples allow us to measure quantitation and detection
	+accuracy of specific miRs for the tools integrated in `bcbio`.

	### size distribution
	The size distribution shows easily the quality of your data. In a normal small
	@@ -144,11 +149,11 @@

	miRNAs which B > A are 181 and 174 follows B > C > D

	-That is more than 95% of accuracy for miRs with more than 5 counts.
	+That is more than 95% of accuracy for miRs with more than 5 counts.

	### specificity

	-To detect specificity we used samples that included specific miRNAs that are not
	+To evaluate specificity we used samples that included specific miRNAs that are not
	normally expressed there. These samples were analyzed in a different
	[run](https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/non_mirqc_bcbio.csv).

	@@ -184,6 +189,11 @@
	quality control\|0:01\|8\|20
	report \| 0\| 1\| 1

	+## Conclusion
	+
	+It would be nice to have a conclusion here, similar to the intro -- re-emphasize
	+the major points and suggest additional work.
	+
	# Thanks
	* [Harvard T.H. Chan School of Public Health](http://bioinformatics.sph.harvard.edu)
	for supporting the integration of small RNAseq pipeline in