The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction. (December 2021)
- Record Type:
- Journal Article
- Title:
- The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction. (December 2021)
- Main Title:
- The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction
- Authors:
- van den Bent, Irene
Makrodimitris, Stavros
Reinders, Marcel - Abstract:
- Computationally annotating proteins with a molecular function is a difficult problem that is made even harder due to the limited amount of available labeled protein training data. Unsupervised protein embeddings partly circumvent this limitation by learning a universal protein representation from many unlabeled sequences. Such embeddings incorporate contextual information of amino acids, thereby modeling the underlying principles of protein sequences insensitive to the context of species. We used an existing pre-trained protein embedding method and subjected its molecular function prediction performance to detailed characterization, first to advance the understanding of protein language models, and second to determine areas of improvement. Then, we applied the model in a transfer learning task by training a function predictor based on the embeddings of annotated protein sequences of one training species and making predictions on the proteins of several test species with varying evolutionary distance. We show that this approach successfully generalizes knowledge about protein function from one eukaryotic species to various other species, outperforming both an alignment-based and a supervised-learning-based baseline. This implies that such a method could be effective for molecular function prediction in inadequately annotated species from understudied taxonomic kingdoms.
- Is Part Of:
- Evolutionary bioinformatics online. Volume 17(2021)
- Journal:
- Evolutionary bioinformatics online
- Issue:
- Volume 17(2021)
- Issue Display:
- Volume 17, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 17
- Issue:
- 2021
- Issue Sort Value:
- 2021-0017-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12
- Subjects:
- Protein function prediction -- protein language models -- protein embedding -- transfer learning -- annotating evolutionary distant proteins
Bioinformatics -- Periodicals
Evolutionary computation -- Periodicals
Genetic programming (Computer science) -- Periodicals
Computational Biology
Evolution, Molecular
Bioinformatics
Electronic journals
Periodicals
Fulltext
Internet Resources
Periodicals
Periodicals
576.8 - Journal URLs:
- http://insights.sagepub.com/journal-evolutionary-bioinformatics-j17 ↗
http://www.uk.sagepub.com/home.nav ↗
http://www.la-press.com/evolutionary-bioinformatics-journal-j17 ↗
http://bibpurl.oclc.org/web/38943 ↗ - DOI:
- 10.1177/11769343211062608 ↗
- Languages:
- English
- ISSNs:
- 1176-9343
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18264.xml