mmterpstra

After the alignment with hisat 0.1.5-beta [cite] to the 1kg grch 37 refernce build and marking duplicates with picard tools 1.130 [cite] done by the gcc (coauteur? van der Vries, G) using molgenis[cite]. The data further analysed by running a coordinate sort using picard tools followed by htseq-count on the reads using the ENSEMBL75[cite] database only using the last 500 bp of transcript annotations for counting expression. Since Htseq-count[cite again?] does not use the pcr duplicate flag, hard filtering for it was done using SAMtools[cite].

Link of description of the gcc pipeline: https://github.com/molgenis/NGS_RNA/blob/NGS_RNA-3.2.4/protocols/QC_Report.sh#L229

extraction of last 500 bp of transcript annotations. https://github.com/mmterpstra/pipeline-util/blob/master/bin/GTfGet1000bpExonsBeforeTES.pl

running the shell scripts:

###splitfqbyCATG.pl Splits reads by CA-TG Sequence. Due to PE difficulties writes to single file.

#Install needs gzip and perl as prerequisites. On ubuntu installation is as easy as sudo apt install gzip perl && wget https://gist.githubusercontent.com/mmterpstra/2417200a96f841862c82220893490202/raw/b4172b5f907c46eb862c5387abca168fd6583388/splitfqbyCATG.pl

#Use perl splitfqbyCATG.pl [-q 0 ] reads_1.in.fq.gz reads.out.fq.gz [reads_2.in.fq.gz ]

###Example 1

	queue	mem	walltime	nodes	ppn	defaultInterpreter	stage	checkStage	WORKDIR	root	group	tmp	resDir	toolDir	projectDir	fastqcMod	bwaMod	picardMod	RMod	gatkMod	snpEffMod	varScanMod	samtoolsMod	vcfToolsMod	genomeLatSpecies	genomeSpecies	genomeBuild	genomeGrchBuild	ensemblVersion	onekgGenomeFasta	onekgGenomeFastaIdxBase	onekgGenomeFastaDict	goldStandardVcf	goldStandardVcfIdx	oneKgPhase1SnpsVcf	oneKgPhase1SnpsVcfIdx	oneKgPhase1IndelsVcf	oneKgPhase1IndelsVcfIdx	${resDir}/${genomeBuild}/variants/1000G_Phase3/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz	dbsnpVcf	dbsnpVcfIdx	exacVcf	exacVcfIdx	indelRealignmentTargets	snpEffStats	motifBin	nextProtBin	pwmsBin	regulation_CD4Bin	regulation_GM06990Bin	regulation_GM12878Bin	regulation_H1ESCBin	regulation_HeLaS3Bin	regulation_HepG2Bin	regulation_HMECBin	regulation_HSMMBin	regulation_HUVECBin	regulation_IMR90Bin	regulation_K562bBin	regulation_K562Bin	regulation_NHABin	regulation_NHEKBin	snpEffectPredictorBin	dbnsfp	dbnsfpTbi	oneKgP1wgsVcf	oneKgP1wgsVcfIdx	c

Test report for easybuilders/easybuild-easyconfigs#1741

Test result

Build succeeded for 1 out of 1 (1 easyconfigs in this PR)

Overview of tested easyconfigs (in order)

SUCCESS cramtools-2.0-Java-1.7.0_80.eb

Time info

start: Wed, 01 Jul 2015 12:26:42 +0000 (UTC)

Test report for easybuilders/easybuild-easyconfigs#1741

Test result

Build succeeded for 1 out of 1 (1 easyconfigs in this PR)

Overview of tested easyconfigs (in order)

SUCCESS cramtools-2.0-Java-1.7.0_80.eb

Time info

start: Wed, 01 Jul 2015 11:28:54 +0000 (UTC)

Test report for easybuilders/easybuild-easyconfigs#1741

Test result

Build succeeded for 1 out of 1 (1 easyconfigs in this PR)

Overview of tested easyconfigs (in order)

SUCCESS cramtools-v2.0-Java-1.7.0_80.eb

Time info

start: Wed, 24 Jun 2015 14:59:21 +0000 (UTC)

	#!/usr/bin/perl
	use strict;
	use warnings;

	my $credits = "Written by Miente Martijn Terpstra from UMCG Department of Genetics subgroup oncogentics last modified [Thu Jul 4 16:35:29 CEST 2013]";


	my $exampleGTF = <<"END";
	chr1 HAVANA transcript 12010 13670 . + . transcript_id "ENST00000450305.2"; locus_id "chr1:11869-14412W"; gene_id "ENSG00000223972.4"; reads 0.000000; length 632; RPKM 0.000000
	chr1 HAVANA transcript 11869 14409 . + . transcript_id "ENST00000456328.2"; locus_id "chr1:11869-14412W"; gene_id "ENSG00000223972.4"; reads 232.000000; length 1657; RPKM 6.344208

	[Header]
	IEMFileVersion,4
	Investigator Name,JohnDoe
	Experiment Name,Experiment1
	Date,01/01/2000
	Workflow,GenerateFASTQ
	Application,FASTQ Only
	Assay,TruSeq LT
	Description,Nugene samples
	Chemistry,Default

	#!/usr/bin/env R

	#maye add commandline to script

	#Yeah, it is a little low I know.
	scrum.points=4

	scrum.startdate="2016/10/24"
	scrum.sprint="93"
	scrum.days=19

	use warnings;
	use strict;
	use Data::Dumper;
	main();

	sub main {
	InfoMsg("Use 'perl $0 /Path/To/Compute/Jobs/*.sh'\n");
	InfoMsg("Commandline : $0 ".join(" ",@ARGV)."\n");
	DuplicateParameterRemoval(@ARGV);
	}