Skip to content

Instantly share code, notes, and snippets.

@iracooke
iracooke / README.md
Last active January 23, 2017 21:57
Quickly Check Sequencing Data

Perform a very quick of sequencing data on a large number of files

Generate a range of standard quality metrics with fastqc

module load FastQC
mkdir fastqc
fastqc *.fastq.gz -o fastqc
@iracooke
iracooke / README.md
Created October 20, 2016 01:23
GBrowse Setup

GBrowse Setup

This repository contains setup instructions, maintenance scripts and links to data for setting up a GBrowse instance on Ubuntu.

Installation

From a bare Ubuntu 14 image Gbrowse can be installed very easily. To install everything

	sudo apt-get install gbrowse gbrowse-calign gbrowse-data libbio-samtools-perl apache2
@iracooke
iracooke / README.md
Last active August 20, 2016 00:49
Rename fasta identifiers

Rename Fasta IDs

This script is needed for programs like secretomep that truncate fasta ids.
We need to be able to uniquely identify each fasta entry so this script renames the ids with a numeric scheme It also produces a mapping file from old to new ids so the original ids can be recovered later

Use it like this

 ./rename_fasta.rb yourfasta.fasta
@iracooke
iracooke / README.md
Last active October 28, 2022 05:18
NCBI TSA Submission Guide

Steps to submit to TSA

If you have a transcriptome that has been assembled from shotgun reads the TSA (Transcriptome Shotgun Assembly) database is a good place to put it so that it can be widely accessed.

This guide assumes that you simply want to submit the assembled sequences from your transcriptome without annotations. NCBI sets a high bar for inclusion of annotations so for most non-model organisms they are probably not going to meet the criteria.

To create a TSA submission take a look at the ncbi guidelines. This gist is based on those guidelines.

Register BioProject

Nucleotide fasta files sometimes encode ambiguous bases simply with an 'N'.
Many downstream tools support this but don't support the full set of IUPAC ambiguity codes

The unix tool tr can be used to get rid of these.

  tr 'RYSWKMBDHV' 'N' < input.fasta
@iracooke
iracooke / thermoid_to_scan_onlyid.sh
Created December 9, 2015 05:22
File Cleanup for PRIDE
#!/bin/bash
# Converts an mzID file from Thermo nativeID format to scan number only nativeID format
file=$1
sed -i.bak s/controllerType\=[0-9]\ controllerNumber\=[0-9]\ // $file
sed -i.bak s/Thermo\ nativeID\ format/scan\ number\ only\ nativeID\ format/ $file
@iracooke
iracooke / README.md
Last active September 20, 2015 23:53
Create a combined transdecoder + 6frame database

Creating a protein database from 6-frame and transdecoder sequences

Analyses in Galaxy

  1. Run the TranscriptomePGMakeDatabase workflow. Input files for this include, a trinity assembly, predicted proteins from Transdecoder, gff3 coordinates corresponding to transdecoder predictions and the cRAP database of contaminants.
  2. Ensure that the known_novel_crap_decoy.fasta output from the above workflow is loaded onto Mascot for searching.
  3. Use the outputs from TranscriptomePGMakeDatabase to run the Transcriptome PG workflow. This workflow will be related to the Transcriptome PG workflow but should be modified to include a Mascot search for your specific organism.
  4. Download the observed_peptides.gff3 file that you get from running the previous workflow step.
@iracooke
iracooke / setOperations.R
Created August 18, 2015 01:39
R Cheat Sheet
setA = c("A","B","C","D")
setB = c("A","B","E","F")
# Everything in both sets
union(setA,setB)
# Only items present in setB
setdiff(setB,setA)
@iracooke
iracooke / readme.md
Last active August 29, 2015 14:24
MSConvert Cheat Sheet

#MSConvert Cheat Sheet

Initial conversion from RAW. Titles in TPP Compatible format

	msconvert *.raw --filter "peakPicking true 1-" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState>" -z

Select scans within a time range