- Run the TranscriptomePGMakeDatabase workflow. Input files for this include, a trinity assembly, predicted proteins from Transdecoder, gff3 coordinates corresponding to transdecoder predictions and the cRAP database of contaminants.
- Ensure that the
known_novel_crap_decoy.fasta
output from the above workflow is loaded onto Mascot for searching. - Use the outputs from
TranscriptomePGMakeDatabase
to run the Transcriptome PG workflow. This workflow will be related to the Transcriptome PG workflow but should be modified to include a Mascot search for your specific organism. - Download the
observed_peptides.gff3
file that you get from running the previous workflow step.
-
First gather the following files in a directory on your laptop
File Description observed_peptides.gff
Output from the Galaxy workflow above transdecoder.gff3
GFF coordinates for transdecoder predictions (used as an input to the Galaxy workflow above) known.fasta
Output from database generation workflow above novel.fasta
Output from the database generation workflow above crap.fasta
The cRAP contaminants. Download -
Make sure you install bedtools on your laptop
-
Extract only
CDS
entries from the transdecoder filecat transdecoder.gff3 | grep 'CDS' > transdecoder_cds.gff3
-
Compare observed peptides with transdecoder
CDS
and keep only those peptides that don't fully overlap with a transdecoderCDS
predictionbedtools subtract -s -f 1.0 -a observed_peptides.gff3 -b transdecoder_cds.gff3 > really_novel.gff3
-
Downlod this filter_fasta.rb script and put it in the same directory as the files above. Make sure it has execute permissions (ie)
chmod u+x filter_fasta.rb
-
Extract the id's from really_novel.gff and then grab their corresponding entries from the novel.fasta file
cat really_novel.gff3 | awk -F '=' '{print $2}' | awk -F '.' '{print "lcl|"$1}' | sort -u > really_novel_ids.txt ./filter_fasta.rb novel.fasta really_novel_ids.txt > really_novel.fasta
-
Concatenate the novel proteins with known.fasta
cat really_novel.fasta known.fasta > known_updated.fasta
For MaxQuant simply rename the DB. Modify the example below for your species
cp known_updated.fasta SAustrinumMQ_20151908.fasta
To create a database for normal TPP decoy searching and for loading on Mascot. For this you need to install protk which has the make_decoy.rb tool.
cat crap.fasta known_updated.fasta > known_updated_crap.fasta
make_decoy.rb -A known_updated_crap.fasta -o SAustrinum_20151908.fasta