A transfer learning protocol for chemical catalysis using a recurrent neural network adapted from natural language processing. (22nd April 2022)
- Record Type:
- Journal Article
- Title:
- A transfer learning protocol for chemical catalysis using a recurrent neural network adapted from natural language processing. (22nd April 2022)
- Main Title:
- A transfer learning protocol for chemical catalysis using a recurrent neural network adapted from natural language processing
- Authors:
- Singh, Sukriti
Sunoj, Raghavan B. - Abstract:
- Abstract : A transfer learning protocol for yield and enantioselectivity predictions of transition metal- and organo-catalytic reactions, suitable for small (<400) to large (>4000) data regimes. Abstract : Minimizing the time and material investments in discovering molecular catalysis would be immensely beneficial. Given the high contemporary importance of homogeneous catalysis in general, and asymmetric catalysis in particular, makes them the most compelling systems for leveraging the power of machine learning (ML). We see an overarching connection between the powerful ML tools such as the transfer learning (TL) used in natural language processing (NLP) and the chemical space, when the latter is described using the SMILES strings conducive for representation learning. We developed a TL protocol, trained on 1 million molecules first, and exploited its ability for accurate predictions of the yield and enantiomeric excess for three diverse reaction classes, encompassing over 5000 transition metal- and organo-catalytic reactions. The TL predicted yields in the Pd-catalyzed Buchwald–Hartwig cross-coupling reaction offered the highest accuracy, with an impressive RMSE of 4.89 implying that 97% of the predicted yields were within 10 units of the actual experimental value. In the case of catalytic asymmetric reactions, such as the enantioselective N, S -acetal formation and asymmetric hydrogenation, RMSEs of 8.65 and 8.38 could be obtained respectively, with the predictedAbstract : A transfer learning protocol for yield and enantioselectivity predictions of transition metal- and organo-catalytic reactions, suitable for small (<400) to large (>4000) data regimes. Abstract : Minimizing the time and material investments in discovering molecular catalysis would be immensely beneficial. Given the high contemporary importance of homogeneous catalysis in general, and asymmetric catalysis in particular, makes them the most compelling systems for leveraging the power of machine learning (ML). We see an overarching connection between the powerful ML tools such as the transfer learning (TL) used in natural language processing (NLP) and the chemical space, when the latter is described using the SMILES strings conducive for representation learning. We developed a TL protocol, trained on 1 million molecules first, and exploited its ability for accurate predictions of the yield and enantiomeric excess for three diverse reaction classes, encompassing over 5000 transition metal- and organo-catalytic reactions. The TL predicted yields in the Pd-catalyzed Buchwald–Hartwig cross-coupling reaction offered the highest accuracy, with an impressive RMSE of 4.89 implying that 97% of the predicted yields were within 10 units of the actual experimental value. In the case of catalytic asymmetric reactions, such as the enantioselective N, S -acetal formation and asymmetric hydrogenation, RMSEs of 8.65 and 8.38 could be obtained respectively, with the predicted enantioselectivities (%ee) within 10 units of its true value in ∼90% of the time. The method is highly time-economic as the workflow bypasses collecting the molecular descriptors and hence of direct implication to high throughput discovery of catalytic transformations. … (more)
- Is Part Of:
- Digital discovery. Volume 1:Number 3(2022)
- Journal:
- Digital discovery
- Issue:
- Volume 1:Number 3(2022)
- Issue Display:
- Volume 1, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 1
- Issue:
- 3
- Issue Sort Value:
- 2022-0001-0003-0000
- Page Start:
- 303
- Page End:
- 312
- Publication Date:
- 2022-04-22
- Subjects:
- Chemistry -- Data processing -- Periodicals
Medical sciences -- Data processing -- Periodicals
Machine learning -- Periodicals
542.85 - Journal URLs:
- https://www.rsc.org/journals-books-databases/about-journals/digital-discovery/ ↗
http://www.rsc.org/ ↗ - DOI:
- 10.1039/d1dd00052g ↗
- Languages:
- English
- ISSNs:
- 2635-098X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22352.xml