Generate a range of standard quality metrics with fastqc
module load FastQC
mkdir fastqc
fastqc *.fastq.gz -o fastqc
This repository contains setup instructions, maintenance scripts and links to data for setting up a GBrowse instance on Ubuntu.
From a bare Ubuntu 14 image Gbrowse can be installed very easily. To install everything
sudo apt-get install gbrowse gbrowse-calign gbrowse-data libbio-samtools-perl apache2
To get you and your computer ready, follow instructions under Setup at https://jcu-eresearch.github.io/2016-09-27-SoftwareCarpentry-tsv/
There are three sections we covered: the Unix Shell (using Bash), R (basic, graphs, and loops), and Git/GitHub.
This script is needed for programs like secretomep that truncate fasta ids.
We need to be able to uniquely identify each fasta entry so this script renames the ids with a numeric scheme
It also produces a mapping file from old to new ids so the original ids can be recovered later
Use it like this
./rename_fasta.rb yourfasta.fasta
If you have a transcriptome that has been assembled from shotgun reads the TSA
(Transcriptome Shotgun Assembly) database is a good place to put it so that it can be widely accessed.
This guide assumes that you simply want to submit the assembled sequences from your transcriptome without annotations. NCBI sets a high bar for inclusion of annotations so for most non-model organisms they are probably not going to meet the criteria.
To create a TSA
submission take a look at the ncbi guidelines. This gist is based on those guidelines.
Nucleotide fasta files sometimes encode ambiguous bases simply with an 'N'.
Many downstream tools support this but don't support the full set of IUPAC ambiguity codes
The unix tool tr
can be used to get rid of these.
tr 'RYSWKMBDHV' 'N' < input.fasta
#!/bin/bash | |
# Converts an mzID file from Thermo nativeID format to scan number only nativeID format | |
file=$1 | |
sed -i.bak s/controllerType\=[0-9]\ controllerNumber\=[0-9]\ // $file | |
sed -i.bak s/Thermo\ nativeID\ format/scan\ number\ only\ nativeID\ format/ $file |
known_novel_crap_decoy.fasta
output from the above workflow is loaded onto Mascot for searching.TranscriptomePGMakeDatabase
to run the Transcriptome PG workflow. This workflow will be related to the Transcriptome PG workflow but should be modified to include a Mascot search for your specific organism.observed_peptides.gff3
file that you get from running the previous workflow step.setA = c("A","B","C","D") | |
setB = c("A","B","E","F") | |
# Everything in both sets | |
union(setA,setB) | |
# Only items present in setB | |
setdiff(setB,setA) |
#MSConvert Cheat Sheet
Initial conversion from RAW. Titles in TPP Compatible format
msconvert *.raw --filter "peakPicking true 1-" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState>" -z
Select scans within a time range