A comparison of classifiers for predicting the class color of fluorescent proteins. (December 2019)
- Record Type:
- Journal Article
- Title:
- A comparison of classifiers for predicting the class color of fluorescent proteins. (December 2019)
- Main Title:
- A comparison of classifiers for predicting the class color of fluorescent proteins
- Authors:
- da Silva, Roger Sá
Marins, Luis Fernando
Almeida, Daniela Volcan
dos Santos Machado, Karina
Werhli, Adriano V. - Abstract:
- Graphical abstract: Highlights: A data set of 109 structures of colored proteins is created. For the first time classification algorithms of protein color class are compared with the same data set. Residue proximity to the chromophore is an important attribute for class color classification. Performance of decision trees, artificial neural networks support vector machines and random forests are similar. Decision trees are the best option due to its interpretability. Abstract: Fluorescent proteins have been applied in a wide variety of fields ranging from basic science to industrial applications. Apart from the naturally occurring fluorescent proteins, there is a growing interest in genetically modified variants that emit light in a specific wavelength. Genetically modifying a protein is not an easy task, especially because the exchange of one residue by other has to achieve the desired property while maintaining protein stability. To help in the choice of residue exchange, computational methods are applied to predict function and stability of proteins. In this work we have prepared a dataset composed by 109 fluorescent proteins and tested four classical supervised classification algorithms: artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs) and random forests (RFs). This is the first time that algorithms are compared in this task. Results of comparing the algorithm's performance shows that DT, SVM and RF were significantly better thanGraphical abstract: Highlights: A data set of 109 structures of colored proteins is created. For the first time classification algorithms of protein color class are compared with the same data set. Residue proximity to the chromophore is an important attribute for class color classification. Performance of decision trees, artificial neural networks support vector machines and random forests are similar. Decision trees are the best option due to its interpretability. Abstract: Fluorescent proteins have been applied in a wide variety of fields ranging from basic science to industrial applications. Apart from the naturally occurring fluorescent proteins, there is a growing interest in genetically modified variants that emit light in a specific wavelength. Genetically modifying a protein is not an easy task, especially because the exchange of one residue by other has to achieve the desired property while maintaining protein stability. To help in the choice of residue exchange, computational methods are applied to predict function and stability of proteins. In this work we have prepared a dataset composed by 109 fluorescent proteins and tested four classical supervised classification algorithms: artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs) and random forests (RFs). This is the first time that algorithms are compared in this task. Results of comparing the algorithm's performance shows that DT, SVM and RF were significantly better than ANNs, and RF was the best method in all the scenarios. However, the interpretability of DTs is highly relevant and can provide important clues about the mechanisms involved in protein color emission. The results are promising and indicate that the use of in silico methods can greatly reduce the time and cost of the in vitro experiments. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 83(2019)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 83(2019)
- Issue Display:
- Volume 83, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 83
- Issue:
- 2019
- Issue Sort Value:
- 2019-0083-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-12
- Subjects:
- Data mining -- Classification -- Fluorescent proteins -- Structural biology
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2019.107089 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23171.xml