HIPPI: highly accurate protein family classification with ensembles of HMMs. (November 2016)
- Record Type:
- Journal Article
- Title:
- HIPPI: highly accurate protein family classification with ensembles of HMMs. (November 2016)
- Main Title:
- HIPPI: highly accurate protein family classification with ensembles of HMMs
- Authors:
- Nguyen, Nam-phuong
Nute, Michael
Mirarab, Siavash
Warnow, Tandy - Abstract:
- Abstract Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improveAbstract Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub athttps://github.com/smirarab/sepp . … (more)
- Is Part Of:
- BMC genomics. Volume 17:Number 10(2016)
- Journal:
- BMC genomics
- Issue:
- Volume 17:Number 10(2016)
- Issue Display:
- Volume 17, Issue 10 (2016)
- Year:
- 2016
- Volume:
- 17
- Issue:
- 10
- Issue Sort Value:
- 2016-0017-0010-0000
- Page Start:
- 89
- Page End:
- 100
- Publication Date:
- 2016-11
- Subjects:
- Protein family identification -- Ensemble of profile hidden Markov models -- Pfam
Genomes -- Periodicals
Gene mapping -- Periodicals
Genomics -- Periodicals
Base Sequence -- Periodicals
Chromosome Mapping -- Periodicals
Genetic Techniques -- Periodicals
Sequence Analysis, DNA -- Periodicals
572.8605 - Journal URLs:
- http://www.biomedcentral.com/bmcgenomics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=32 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12864-016-3097-0 ↗
- Languages:
- English
- ISSNs:
- 1471-2164
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 10045.xml