Last active
August 29, 2015 14:18
-
-
Save antonkulaga/d0c76a2cd6e6de55b9af to your computer and use it in GitHub Desktop.
transcriptome assembly with Scythe-Sickle-Hisat-Stringtie-Ballgrown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
###IMPROVING FASTQ (note: apply the same to all your fastq files) ### | |
#deleting illumina adapters by https://github.com/vsbuffalo/scythe | |
./sickle se -f /home/uploader/flies/assembly4/3_cleaned.fastq -t sanger /home/uploader/flies/assembly4/3.fastq | |
#triming fastq by https://github.com/najoshi/sickle | |
sickle se -f /home/uploader/flies/assembly4/3_cleaned.fastq -t sanger -o /home/uploader/flies/assembly4/3.fastq -q 25 | |
### Hisat http://ccb.jhu.edu/software/hisat/manual.shtml ### | |
#building hisat index | |
hisat-build dmel.fasta dmel.hisat | |
#extracting know splice sites to ease alignment for hisat | |
python extract_splice_sites.py in_gtf_filename > out_splice_site_filename | |
#aligning reads | |
hisat -x dmel.hisat -U 3.fastq -S 3_hisat.sam --known-splicesite-infile splicesites.txt | |
#12299676 reads; of these: | |
# 12299676 (100.00%) were unpaired; of these: | |
# 1665253 (13.54%) aligned 0 times | |
# 8570171 (69.68%) aligned exactly 1 time | |
# 2064252 (16.78%) aligned >1 times | |
#86.46% overall alignment rate | |
### StringTie ### | |
#convertion to bam by samtools http://www.htslib.org | |
samtools view -S -b 3_hisat.sam > 3.bam | |
#sorting | |
samtools sort 3.bam 3_sorted | |
#GTF creation | |
#generating gtf-s as well as coverage that will be required for further diff analysis | |
stringtie 3_sorted.bam -G dmel.gtf -o 3.gtf -v -m 100 -C 3_cov.gtf -B | |
#if we want just to get gtfs, then it will be enough to: stringtie 3_sorted.bam -G dmel.gtf -o 3.gtf -v -m 100 | |
#creating a list of files for cuffmerge | |
touch merge.txt | |
nano merge.txt | |
#then -> add pathes to gtf-s to merge, one line for each | |
#merging all created GTF files (for the same of simplicity assume they are in /home/uploader/flies/assembly4/transcripts/<name> #folders, while fasta-s are in /home/uploader/flies/assembly | |
cuffmerge -o /home/uploader/flies/assembly4/transcripts -g /home/uploader/flies/assembly4/transcripts/dmel.gtf -s /home/uploader/flies/assembly4/dmel.fasta -p 4 /home/uploader/flies/assembly4/transcripts/merge.txt | |
###Ballgrown analysis#### |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment