CoPriNet: graph neural networks provide accurate and rapid compound price prediction for molecule prioritisation. (7th December 2022)
- Record Type:
- Journal Article
- Title:
- CoPriNet: graph neural networks provide accurate and rapid compound price prediction for molecule prioritisation. (7th December 2022)
- Main Title:
- CoPriNet: graph neural networks provide accurate and rapid compound price prediction for molecule prioritisation
- Authors:
- Sanchez-Garcia, Ruben
Havasi, Dávid
Takács, Gergely
Robinson, Matthew C.
Lee, Alpha
von Delft, Frank
Deane, Charlotte M. - Abstract:
- Abstract : CoPriNet can predict compound prices after being trained on 6M pairs of compounds and prices collected from the Mcule catalogue. Abstract : Compound availability is a critical property for design prioritization across the drug discovery pipeline. Historically, and despite their multiple limitations, compound-oriented synthetic accessibility scores have been used as proxies for this problem. However, the size of the catalogues of commercially available molecules has dramatically increased over the last decade, redefining the problem of compound accessibility as a matter of budget. In this paper we show that if compound prices are the desired proxy for compound availability, then synthetic accessibility scores are not effective strategies for us in selection. Our approach, CoPriNet, is a retrosynthesis-free deep learning model trained on 2D graph representations of compounds alongside their prices extracted from the Mcule catalogue. We show that CoPriNet provides price predictions that correlate far better with actual compound prices than any synthetic accessibility score. Moreover, unlike standard retrosynthesis methods, CoPriNet is rapid, with execution times comparable to popular synthetic accessibility metrics, and thus is suitable for high-throughput experiments including virtual screening and de novo compound generation. While the Mcule catalogue is a proprietary dataset, the CoPriNet source code and the model trained on the proprietary data as well as theAbstract : CoPriNet can predict compound prices after being trained on 6M pairs of compounds and prices collected from the Mcule catalogue. Abstract : Compound availability is a critical property for design prioritization across the drug discovery pipeline. Historically, and despite their multiple limitations, compound-oriented synthetic accessibility scores have been used as proxies for this problem. However, the size of the catalogues of commercially available molecules has dramatically increased over the last decade, redefining the problem of compound accessibility as a matter of budget. In this paper we show that if compound prices are the desired proxy for compound availability, then synthetic accessibility scores are not effective strategies for us in selection. Our approach, CoPriNet, is a retrosynthesis-free deep learning model trained on 2D graph representations of compounds alongside their prices extracted from the Mcule catalogue. We show that CoPriNet provides price predictions that correlate far better with actual compound prices than any synthetic accessibility score. Moreover, unlike standard retrosynthesis methods, CoPriNet is rapid, with execution times comparable to popular synthetic accessibility metrics, and thus is suitable for high-throughput experiments including virtual screening and de novo compound generation. While the Mcule catalogue is a proprietary dataset, the CoPriNet source code and the model trained on the proprietary data as well as the fraction of the catalogue (100 K compound/prices) used as test dataset have been made publicly available at ; https://github.com/oxpig/CoPriNet . … (more)
- Is Part Of:
- Digital discovery. Volume 2:Number 1(2023)
- Journal:
- Digital discovery
- Issue:
- Volume 2:Number 1(2023)
- Issue Display:
- Volume 2, Issue 1 (2023)
- Year:
- 2023
- Volume:
- 2
- Issue:
- 1
- Issue Sort Value:
- 2023-0002-0001-0000
- Page Start:
- 103
- Page End:
- 111
- Publication Date:
- 2022-12-07
- Subjects:
- Chemistry -- Data processing -- Periodicals
Medical sciences -- Data processing -- Periodicals
Machine learning -- Periodicals
542.85 - Journal URLs:
- https://www.rsc.org/journals-books-databases/about-journals/digital-discovery/ ↗
http://www.rsc.org/ ↗ - DOI:
- 10.1039/d2dd00071g ↗
- Languages:
- English
- ISSNs:
- 2635-098X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26029.xml