- Run the TranscriptomePGMakeDatabase workflow. Input files for this include, a trinity assembly, predicted proteins from Transdecoder, gff3 coordinates corresponding to transdecoder predictions and the cRAP database of contaminants.
- Ensure that the
known_novel_crap_decoy.fastaoutput from the above workflow is loaded onto Mascot for searching. - Use the outputs from
TranscriptomePGMakeDatabaseto run the Transcriptome PG workflow. This workflow will be related to the Transcriptome PG workflow but should be modified to include a Mascot search for your specific organism. - Download the
observed_peptides.gff3file that you get from running the previous workflow step.
-
First gather the following files in a directory on your laptop
File Description observed_peptides.gffOutput from the Galaxy workflow above transdecoder.gff3GFF coordinates for transdecoder predictions (used as an input to the Galaxy workflow above) known.fastaOutput from database generation workflow above novel.fastaOutput from the database generation workflow above crap.fastaThe cRAP contaminants. Download -
Make sure you install bedtools on your laptop
-
Extract only
CDSentries from the transdecoder filecat transdecoder.gff3 | grep 'CDS' > transdecoder_cds.gff3
-
Compare observed peptides with transdecoder
CDSand keep only those peptides that don't fully overlap with a transdecoderCDSpredictionbedtools subtract -s -f 1.0 -a observed_peptides.gff3 -b transdecoder_cds.gff3 > really_novel.gff3 -
Downlod this filter_fasta.rb script and put it in the same directory as the files above. Make sure it has execute permissions (ie)
chmod u+x filter_fasta.rb
-
Extract the id's from really_novel.gff and then grab their corresponding entries from the novel.fasta file
cat really_novel.gff3 | awk -F '=' '{print $2}' | awk -F '.' '{print "lcl|"$1}' | sort -u > really_novel_ids.txt ./filter_fasta.rb novel.fasta really_novel_ids.txt > really_novel.fasta
-
Concatenate the novel proteins with known.fasta
cat really_novel.fasta known.fasta > known_updated.fasta
For MaxQuant simply rename the DB. Modify the example below for your species
cp known_updated.fasta SAustrinumMQ_20151908.fastaTo create a database for normal TPP decoy searching and for loading on Mascot. For this you need to install protk which has the make_decoy.rb tool.
cat crap.fasta known_updated.fasta > known_updated_crap.fasta
make_decoy.rb -A known_updated_crap.fasta -o SAustrinum_20151908.fasta