Toward completion of the Earth's proteome: an update a decade later. (12th October 2017)
- Record Type:
- Journal Article
- Title:
- Toward completion of the Earth's proteome: an update a decade later. (12th October 2017)
- Main Title:
- Toward completion of the Earth's proteome: an update a decade later
- Authors:
- Mier, Pablo
Andrade-Navarro, Miguel A - Abstract:
- Abstract: Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down theAbstract: Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down the prediction we did a decade ago. … (more)
- Is Part Of:
- Briefings in bioinformatics. Volume 20:Number 2(2019)
- Journal:
- Briefings in bioinformatics
- Issue:
- Volume 20:Number 2(2019)
- Issue Display:
- Volume 20, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 20
- Issue:
- 2
- Issue Sort Value:
- 2019-0020-0002-0000
- Page Start:
- 463
- Page End:
- 470
- Publication Date:
- 2017-10-12
- Subjects:
- protein databases -- sequencing projects
Genetics -- Data processing -- Periodicals
Molecular biology -- Data processing -- Periodicals
Genomes -- Data processing -- Periodicals
572.80285 - Journal URLs:
- http://bib.oxfordjournals.org ↗
http://www.oxfordjournals.org/content?genre=journal&issn=1477-4054 ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1093/bib/bbx127 ↗
- Languages:
- English
- ISSNs:
- 1467-5463
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2283.958363
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20847.xml