PrePCI: A structure‐ and chemical similarity‐informed database of predicted protein compound interactions. (16th March 2023)
- Record Type:
- Journal Article
- Title:
- PrePCI: A structure‐ and chemical similarity‐informed database of predicted protein compound interactions. (16th March 2023)
- Main Title:
- PrePCI: A structure‐ and chemical similarity‐informed database of predicted protein compound interactions
- Authors:
- Trudeau, Stephen J.
Hwang, Howook
Mathur, Deepika
Begum, Kamrun
Petrey, Donald
Murray, Diana
Honig, Barry - Abstract:
- Abstract: We describe the Predicting Protein–Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19, 797 human proteins. PrePCI relies on a proteome‐wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence‐ and structural similarity‐based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT‐scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drugAbstract: We describe the Predicting Protein–Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19, 797 human proteins. PrePCI relies on a proteome‐wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence‐ and structural similarity‐based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT‐scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described. … (more)
- Is Part Of:
- Protein science. Volume 32:Number 4(2023)
- Journal:
- Protein science
- Issue:
- Volume 32:Number 4(2023)
- Issue Display:
- Volume 32, Issue 4 (2023)
- Year:
- 2023
- Volume:
- 32
- Issue:
- 4
- Issue Sort Value:
- 2023-0032-0004-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2023-03-16
- Subjects:
- chemical similarity -- protein–compound interactions -- protein–compound database -- structural alignment
Proteins -- Periodicals
572.6 - Journal URLs:
- http://www.proteinscience.org/ ↗
http://www3.interscience.wiley.com/journal/121502357/ ↗
http://onlinelibrary.wiley.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1002/pro.4594 ↗
- Languages:
- English
- ISSNs:
- 0961-8368
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 6936.105500
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 26827.xml