Skip to content

Instantly share code, notes, and snippets.

@olgabot
Created January 27, 2015 17:40
Show Gist options
  • Save olgabot/f65365842e27d2487ad3 to your computer and use it in GitHub Desktop.
Save olgabot/f65365842e27d2487ad3 to your computer and use it in GitHub Desktop.
This readme should help you get started with "pfam_scan.pl", which is for use
with the HMMER3 version of HMMER.
--------------------------------------------------------------------------------
- Setting up -------------------------------------------------------------------
--------------------------------------------------------------------------------
Unpack the script
=================
shell% tar zxvf PfamScan.tar.gz
...
shell% cd PfamScan
Install HMMER3
==============
1) Get the current tarball of the latest HMMER3 release from the HMMER site at
Janelia farm. For example:
shell% wget ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/hmmer-3.1b1.tar.gz
2) Uncompress the tarball
shell% tar zxvf hmmer-3.1b1.tar.gz
3) Compile the HMMER3 source code (optional - skip to the section about adding
HMMER3 binaries to your path). You can also download precompiled binaries, but
it is recommmended that you compile your source code with your system compiler.
HMMER usually compiles in three easy steps. For detailed instructions, see the
HMMER3 INSTALL guide.
i) Change the the HMMER source directory:
shell% cd hmmer-3.1b1
ii) Run "configure". For best performance, and depending on the architecture of
the machine that will be running HMMER, it's recommended that you use the Intel
compiler, "icc" to build the HMMER binaries. You can also use "gcc" but note
that older versions (<3.3) will not correctly compile the code.
You can tell "configure" which compiler to use by adding the "CC" flag:
shell% ./configure CC=icc LDFLAGS="-static" --prefix=/path/to/install/hmmer3
iii) Run make:
shell% make
iv) Check that everything is compiled and working, by running:
shell% make check
v) Install HMMER:
shell% make install
Adding HMMER3 binaries to your path
===================================
You need to make sure that the HMMER binaries are found on your shell's
executable search path.
If you are using bash:
bash% export PATH=/path/to/install/hmmer3:$PATH
or, if you are using csh (or tcsh):
csh% setenv PATH /path/to/install/hmmer3:$PATH
Note: if you are using the pre-compiled binaries, just point to those, e.g.
/path/to/uncompressedTar/hmmer-3.0b3/binaries.
Non-standard Perl dependencies
==============================
The PfamScan.pm module depends on several modules that don't come as part of a
standard Perl distribution, notably the Moose framework. Everything that you
need can be installed using the "cpan" tool. You'll need to make sure that
you've already configured your "cpan" environment, and then:
shell% cpan Moose
Moose itself has quite a few dependencies, so don't worry if it looks like
you're installing half of CPAN !
PfamScan.pm also requires bioperl. We have currently only tested against
bioperl 1.4, and we believe it works with 1.6. You can install bioperl via
CPAN, or you can download bioperl 1.4 from here:
http://bioperl.org/DIST/bioperl-1.4.tar.gz
Adding Pfam Modules to your PERL5LIB
====================================
If you are using bash:
bash% export PERL5LIB=/path/to/pfam_scanDir:$PERL5LIB
or for C-shells:
csh% setenv PERL5LIB /path/to/pfam_scanDir:$PERL5LIB
Change the path of Perl in pfam_scan.pl
=======================================
Open PfamScan/pfam_scan.pl in a text editor, and change the first line
of the code to point to your Perl.
You should be good to go now!
--------------------------------------------------------------------------------
- Running searches using "pfam_scan.pl" ----------------------------------------
--------------------------------------------------------------------------------
Download Pfam data files
========================
You will need to download the following files from the Pfam ftp site
(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/):
Pfam-A.hmm
Pfam-A.hmm.dat
Pfam-B.hmm
Pfam-B.hmm.dat
active_site.dat
You will need to generate binary files for Pfam-A.hmm and Pfam-B.hmm by running the following commands:
hmmpress Pfam-A.hmm
hmmpress Pfam-B.hmm
Using pfam_scan.pl
==================
"pfam_scan.pl" is a program that searches a FASTA file against a library of
Pfam HMMs.
REQUIREMENTS
============
To recap, "pfam_scan.pl" requires:
- Several modules written by Pfam, all of which are available as part of this
tarball
- Bioperl and HMMER3 installed (hmmscan and hmmalign should be in your path)
- Pfam-A.hmm (and binaries): a data file that contains the Pfam-A library of HMMs
- Pfam-B.hmm (and binaries): a data file that contains the Pfam-B library of HMMs
- Pfam-A.hmm.dat: a data file that contains information about each Pfam-A
family
- Pfam-B.hmm.dat: a data file that contains information about each Pfam-B
family
- active_site.dat: a data file needed for the -as option, which contains active
site information about each family
- a FASTA-format file containing your query sequence(s)
Usage
=====
pfam_scan.pl -fasta <fasta_file> -dir <directory location of Pfam files>
Additonal options:
-h : show this help
-o <file> : output file, otherwise send to STDOUT
-clan_overlap : show overlapping hits within clan member families (applies to Pfam-A families only)
-align : show the HMM-sequence alignment for each match
-e_seq <n> : specify hmmscan evalue sequence cutoff for Pfam-A searches (default Pfam defined)
-e_dom <n> : specify hmmscan evalue domain cutoff for Pfam-A searches (default Pfam defined)
-b_seq <n> : specify hmmscan bit score sequence cutoff for Pfam-A searches (default Pfam defined)
-b_dom <n> : specify hmmscan bit score domain cutoff for Pfam-A searches (default Pfam defined)
-pfamB : search against Pfam-B HMMs (uses E-value sequence and domain cutoff 0.001),
in addition to searching Pfam-A HMMs
-only_pfamB : search against Pfam-B HMMs only (uses E-value sequence and domain cutoff 0.001)
-as : predict active site residues for Pfam-A matches
-json [pretty] : write results in JSON format. If the optional value "pretty" is given,
the JSON output will be formatted using the "pretty" option in the JSON
module
For more help, check the perldoc:
shell% perldoc pfam_scan.pl
Output format
=============
The output should be familiar to anyone who's used the old "pfam_scan.pl"
script. Each line contains the following information:
<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value><significance> <clan> <predicted_active_site_residues>
Example output (with -as option):
Q5NEL3.1 2 224 2 227 PB01348.1 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA
O65039.1 38 93 38 93 PF08246.5 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan
O65039.1 126 342 126 342 PF00112.16 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307]
FAQ
===
Q: Why are the results from running hmmscan different from those I get when I run pfam_scan.pl?
A: We group together families which we believe to have a common evolutionary ancestor in clans. Where there
are overlapping matches within a clan, pfam_scan.pl will only show the most significant (the lowest E-value)
match within the clan. We perform the same clan filtering step on the Pfam website. If you do want the
script to report all the overlapping clan matches, you can use the -clan_overlap option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment