Skip to content

Instantly share code, notes, and snippets.

@andrese52
Last active April 11, 2018 03:05
Show Gist options
  • Save andrese52/9f287c5486d6f2a4f2fa10e8a0968824 to your computer and use it in GitHub Desktop.
Save andrese52/9f287c5486d6f2a4f2fa10e8a0968824 to your computer and use it in GitHub Desktop.
Eprobe curation

E-probe design and curation

Blast of e-probes with nt

blastn -task blastn -query eprobes.fasta -db nt -out e-probes.txt -num_threads 12
BlastCheck.pl -c raw-eprobes.fasta -i blast_output.txt -p output_name -f common,names,comma,separated,without,spaces

In the case of the list of common names let us use an example. The Huanglongbing (HLB or Citrus Greening) is a pathogen that attacks citrus species. Whenever we search the NCBI database and download sequences, we rapidly note that the fasta headers contain the following names: Greening,Huanglongbing,Candidatus,liberibacter,asiaticus,americanus,africanus If we have e-probes specific to HLB, these names must be used in the -f argument.

Sequence manipulation before MetaSim

Fasta headers are a headache in Metasim and we are required to remove to avoid the issue of modifying 1000s of entries in the taxon profile. To remove the headers of a multifasta use the following code:

sed "/>/d" target_multifasta.fasta > output_singleheader.fasta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment