Created
January 27, 2015 17:40
-
-
Save olgabot/f65365842e27d2487ad3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This readme should help you get started with "pfam_scan.pl", which is for use | |
with the HMMER3 version of HMMER. | |
-------------------------------------------------------------------------------- | |
- Setting up ------------------------------------------------------------------- | |
-------------------------------------------------------------------------------- | |
Unpack the script | |
================= | |
shell% tar zxvf PfamScan.tar.gz | |
... | |
shell% cd PfamScan | |
Install HMMER3 | |
============== | |
1) Get the current tarball of the latest HMMER3 release from the HMMER site at | |
Janelia farm. For example: | |
shell% wget ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/hmmer-3.1b1.tar.gz | |
2) Uncompress the tarball | |
shell% tar zxvf hmmer-3.1b1.tar.gz | |
3) Compile the HMMER3 source code (optional - skip to the section about adding | |
HMMER3 binaries to your path). You can also download precompiled binaries, but | |
it is recommmended that you compile your source code with your system compiler. | |
HMMER usually compiles in three easy steps. For detailed instructions, see the | |
HMMER3 INSTALL guide. | |
i) Change the the HMMER source directory: | |
shell% cd hmmer-3.1b1 | |
ii) Run "configure". For best performance, and depending on the architecture of | |
the machine that will be running HMMER, it's recommended that you use the Intel | |
compiler, "icc" to build the HMMER binaries. You can also use "gcc" but note | |
that older versions (<3.3) will not correctly compile the code. | |
You can tell "configure" which compiler to use by adding the "CC" flag: | |
shell% ./configure CC=icc LDFLAGS="-static" --prefix=/path/to/install/hmmer3 | |
iii) Run make: | |
shell% make | |
iv) Check that everything is compiled and working, by running: | |
shell% make check | |
v) Install HMMER: | |
shell% make install | |
Adding HMMER3 binaries to your path | |
=================================== | |
You need to make sure that the HMMER binaries are found on your shell's | |
executable search path. | |
If you are using bash: | |
bash% export PATH=/path/to/install/hmmer3:$PATH | |
or, if you are using csh (or tcsh): | |
csh% setenv PATH /path/to/install/hmmer3:$PATH | |
Note: if you are using the pre-compiled binaries, just point to those, e.g. | |
/path/to/uncompressedTar/hmmer-3.0b3/binaries. | |
Non-standard Perl dependencies | |
============================== | |
The PfamScan.pm module depends on several modules that don't come as part of a | |
standard Perl distribution, notably the Moose framework. Everything that you | |
need can be installed using the "cpan" tool. You'll need to make sure that | |
you've already configured your "cpan" environment, and then: | |
shell% cpan Moose | |
Moose itself has quite a few dependencies, so don't worry if it looks like | |
you're installing half of CPAN ! | |
PfamScan.pm also requires bioperl. We have currently only tested against | |
bioperl 1.4, and we believe it works with 1.6. You can install bioperl via | |
CPAN, or you can download bioperl 1.4 from here: | |
http://bioperl.org/DIST/bioperl-1.4.tar.gz | |
Adding Pfam Modules to your PERL5LIB | |
==================================== | |
If you are using bash: | |
bash% export PERL5LIB=/path/to/pfam_scanDir:$PERL5LIB | |
or for C-shells: | |
csh% setenv PERL5LIB /path/to/pfam_scanDir:$PERL5LIB | |
Change the path of Perl in pfam_scan.pl | |
======================================= | |
Open PfamScan/pfam_scan.pl in a text editor, and change the first line | |
of the code to point to your Perl. | |
You should be good to go now! | |
-------------------------------------------------------------------------------- | |
- Running searches using "pfam_scan.pl" ---------------------------------------- | |
-------------------------------------------------------------------------------- | |
Download Pfam data files | |
======================== | |
You will need to download the following files from the Pfam ftp site | |
(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/): | |
Pfam-A.hmm | |
Pfam-A.hmm.dat | |
Pfam-B.hmm | |
Pfam-B.hmm.dat | |
active_site.dat | |
You will need to generate binary files for Pfam-A.hmm and Pfam-B.hmm by running the following commands: | |
hmmpress Pfam-A.hmm | |
hmmpress Pfam-B.hmm | |
Using pfam_scan.pl | |
================== | |
"pfam_scan.pl" is a program that searches a FASTA file against a library of | |
Pfam HMMs. | |
REQUIREMENTS | |
============ | |
To recap, "pfam_scan.pl" requires: | |
- Several modules written by Pfam, all of which are available as part of this | |
tarball | |
- Bioperl and HMMER3 installed (hmmscan and hmmalign should be in your path) | |
- Pfam-A.hmm (and binaries): a data file that contains the Pfam-A library of HMMs | |
- Pfam-B.hmm (and binaries): a data file that contains the Pfam-B library of HMMs | |
- Pfam-A.hmm.dat: a data file that contains information about each Pfam-A | |
family | |
- Pfam-B.hmm.dat: a data file that contains information about each Pfam-B | |
family | |
- active_site.dat: a data file needed for the -as option, which contains active | |
site information about each family | |
- a FASTA-format file containing your query sequence(s) | |
Usage | |
===== | |
pfam_scan.pl -fasta <fasta_file> -dir <directory location of Pfam files> | |
Additonal options: | |
-h : show this help | |
-o <file> : output file, otherwise send to STDOUT | |
-clan_overlap : show overlapping hits within clan member families (applies to Pfam-A families only) | |
-align : show the HMM-sequence alignment for each match | |
-e_seq <n> : specify hmmscan evalue sequence cutoff for Pfam-A searches (default Pfam defined) | |
-e_dom <n> : specify hmmscan evalue domain cutoff for Pfam-A searches (default Pfam defined) | |
-b_seq <n> : specify hmmscan bit score sequence cutoff for Pfam-A searches (default Pfam defined) | |
-b_dom <n> : specify hmmscan bit score domain cutoff for Pfam-A searches (default Pfam defined) | |
-pfamB : search against Pfam-B HMMs (uses E-value sequence and domain cutoff 0.001), | |
in addition to searching Pfam-A HMMs | |
-only_pfamB : search against Pfam-B HMMs only (uses E-value sequence and domain cutoff 0.001) | |
-as : predict active site residues for Pfam-A matches | |
-json [pretty] : write results in JSON format. If the optional value "pretty" is given, | |
the JSON output will be formatted using the "pretty" option in the JSON | |
module | |
For more help, check the perldoc: | |
shell% perldoc pfam_scan.pl | |
Output format | |
============= | |
The output should be familiar to anyone who's used the old "pfam_scan.pl" | |
script. Each line contains the following information: | |
<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value><significance> <clan> <predicted_active_site_residues> | |
Example output (with -as option): | |
Q5NEL3.1 2 224 2 227 PB01348.1 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA | |
O65039.1 38 93 38 93 PF08246.5 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan | |
O65039.1 126 342 126 342 PF00112.16 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307] | |
FAQ | |
=== | |
Q: Why are the results from running hmmscan different from those I get when I run pfam_scan.pl? | |
A: We group together families which we believe to have a common evolutionary ancestor in clans. Where there | |
are overlapping matches within a clan, pfam_scan.pl will only show the most significant (the lowest E-value) | |
match within the clan. We perform the same clan filtering step on the Pfam website. If you do want the | |
script to report all the overlapping clan matches, you can use the -clan_overlap option. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment