After the alignment with hisat 0.1.5-beta [cite] to the 1kg grch 37 refernce build and marking duplicates with picard tools 1.130 [cite] done by the gcc (coauteur? van der Vries, G) using molgenis[cite]. The data further analysed by running a coordinate sort using picard tools followed by htseq-count on the reads using the ENSEMBL75[cite] database only using the last 500 bp of transcript annotations for counting expression. Since Htseq-count[cite again?] does not use the pcr duplicate flag, hard filtering for it was done using SAMtools[cite].
Link of description of the gcc pipeline: https://github.com/molgenis/NGS_RNA/blob/NGS_RNA-3.2.4/protocols/QC_Report.sh#L229
extraction of last 500 bp of transcript annotations. https://github.com/mmterpstra/pipeline-util/blob/master/bin/GTfGet1000bpExonsBeforeTES.pl
running the shell scripts: