A benchmark of optimally folded protein structures using integer programming and the 3D-HP-SC model. (February 2020)
- Record Type:
- Journal Article
- Title:
- A benchmark of optimally folded protein structures using integer programming and the 3D-HP-SC model. (February 2020)
- Main Title:
- A benchmark of optimally folded protein structures using integer programming and the 3D-HP-SC model
- Authors:
- Hattori, Leandro Takeshi
Gutoski, Matheus
Vargas Benítez, César Manuel
Nunes, Luiz Fernando
Lopes, Heitor Silvério - Abstract:
- Highlights: The proposed method finds optimal folds with maximum number of hydrophobic side-chain contacts. The computational effort of the method grows exponentially with the number of hydrophobic amino acids. The linear correlation between the number of hydrophobic side-chain contacts and the number of hydrophobic amino acids may establish an upper bound for further studies. Results indicate that the best range of thresholds to define a hydrophobic contact is between 5.2 and 8.2 Å. In this range the proposed method give conformations most similar to the real-world proteins. Using only the maximization of hydrophobic contacts to drive the folding process is not enough to accurately predict real protein structures. It is provided a benchmark with 17 real protein sequences, optimally folded, according the 3D-HP-SC model. All the software developed are made freely available, including the program for extracting and converting biological sequences to the HP-3D-SC model, and integer programming optimization. Abstract: The Protein Structure Prediction (PSP) problem comprises, among other issues, forecasting the three-dimensional native structure of proteins using only their primary structure information. Most computational studies in this area use synthetic data instead of real biological data. However, the closer to the real-world, the more the impact of results and their applicability. This work presents 17 real protein sequences extracted from the Protein Data Bank for aHighlights: The proposed method finds optimal folds with maximum number of hydrophobic side-chain contacts. The computational effort of the method grows exponentially with the number of hydrophobic amino acids. The linear correlation between the number of hydrophobic side-chain contacts and the number of hydrophobic amino acids may establish an upper bound for further studies. Results indicate that the best range of thresholds to define a hydrophobic contact is between 5.2 and 8.2 Å. In this range the proposed method give conformations most similar to the real-world proteins. Using only the maximization of hydrophobic contacts to drive the folding process is not enough to accurately predict real protein structures. It is provided a benchmark with 17 real protein sequences, optimally folded, according the 3D-HP-SC model. All the software developed are made freely available, including the program for extracting and converting biological sequences to the HP-3D-SC model, and integer programming optimization. Abstract: The Protein Structure Prediction (PSP) problem comprises, among other issues, forecasting the three-dimensional native structure of proteins using only their primary structure information. Most computational studies in this area use synthetic data instead of real biological data. However, the closer to the real-world, the more the impact of results and their applicability. This work presents 17 real protein sequences extracted from the Protein Data Bank for a benchmark to the PSP problem using the tri-dimensional Hydrophobic-Polar with Side-Chains model (3D-HP-SC). The native structure of these proteins was found by maximizing the number of hydrophobic contacts between the side-chains of amino acids. The problem was treated as an optimization problem and solved by means of an Integer Programming approach. Although the method optimally solves the problem, the processing time has an exponential trend. Therefore, due to computational limitations, the method is a proof-of-concept and it is not applicable to large sequences. For unknown sequences, an upper bound of the number of hydrophobic contacts (using this model) can be found, due to a linear relationship with the number of hydrophobic residues. The comparison between the predicted and the biological structures showed that the highest similarity between them was found with distance thresholds around 5.2–8.2 Å. Both the dataset and the programs developed will be freely available to foster further research in the area. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 84(2020)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 84(2020)
- Issue Display:
- Volume 84, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 84
- Issue:
- 2020
- Issue Sort Value:
- 2020-0084-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-02
- Subjects:
- Biological sequences -- Hydrophobic-polar model -- Integer programming -- Protein structure problem
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2019.107192 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 12624.xml