Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. (8th June 2020)

Record Type:: Journal Article
Title:: Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. (8th June 2020)
Main Title:: Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments
Authors:: Neuwald, Andrew F
Lanczycki, Christopher J
Hodges, Theresa K
Marchler-Bauer, Aron
Abstract:: Abstract: For optimal performance, machine learning methods for protein sequence/structural analysis typically require as input a large multiple sequence alignment (MSA), which is often created using query-based iterative programs, such as PSI-BLAST or JackHMMER. However, because these programs align database sequences using a query sequence as a template, they may fail to detect or may tend to misalign sequences distantly related to the query. More generally, automated MSA programs often fail to align sequences correctly due to the unpredictable nature of protein evolution. Addressing this problem typically requires manual curation in the light of structural data. However, curated MSAs tend to contain too few sequences to serve as input for statistically based methods. We address these shortcomings by making publicly available a set of 252 curated hierarchical MSAs (hiMSAs), containing a total of 26 212 066 sequences, along with programs for generating from these extremely large MSAs. Each hiMSA consists of a set of hierarchically arranged MSAs representing individual subgroups within a superfamily along with template MSAs specifying how to align each subgroup MSA against MSAs higher up the hierarchy. Central to this approach is the MAPGAPS search program, which uses a hiMSA as a query to align (potentially vast numbers of) matching database sequences with accuracy comparable to that of the curated hiMSA. We illustrate this process for the … (more)
Is Part Of:: Database. Volume 2020(2020)
Journal:: Database
Issue:: Volume 2020(2020)
Issue Display:: Volume 2020, Issue 2020 (2020)
Year:: 2020
Volume:: 2020
Issue:: 2020
Issue Sort Value:: 2020-2020-2020-0000
Page Start:
Page End:
Publication Date:: 2020-06-08
Subjects:: Biology -- Databases -- Periodicals
Bioinformatics -- Periodicals
570.285
Journal URLs:: http://database.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗
DOI:: 10.1093/database/baaa042 ↗
Languages:: English
ISSNs:: 1758-0463
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 25837.xml