Protein sequence profile prediction using ProtAlbert transformer. (August 2022)
- Record Type:
- Journal Article
- Title:
- Protein sequence profile prediction using ProtAlbert transformer. (August 2022)
- Main Title:
- Protein sequence profile prediction using ProtAlbert transformer
- Authors:
- Behjati, Armin
Zare-Mirakabad, Fatemeh
Arab, Seyed Shahriar
Nowzari-Dalini, Abbas - Abstract:
- Abstract: Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles.Abstract: Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles. Graphical Abstract: ga1 Highlights: Introducing PA_SPP method for profile prediction using a single protein sequence without alignment. Using pre-trained ProtAlbert transformer in PA_SPP method instead of alignment and scoring systems for profile prediction. Assessing the attention heads of transformer to capture five essential protein characteristics from a single sequence. According to five protein characteristic, ProtAlbert transformer suggests appropriate amino acids for each position. The predicted profile by the transformer shows high quality compared to the HSSP profile. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 99(2022)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 99(2022)
- Issue Display:
- Volume 99, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 99
- Issue:
- 2022
- Issue Sort Value:
- 2022-0099-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08
- Subjects:
- HSSP profile -- Nearest-neighbor interactions -- Protein tertiary structure
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2022.107717 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22671.xml