Comment on "A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials" by Hattrick-Simpers, et al., Molecular Systems Design & Engineering, 2018, 3, 509. Issue 2 (7th February 2020)
- Record Type:
- Journal Article
- Title:
- Comment on "A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials" by Hattrick-Simpers, et al., Molecular Systems Design & Engineering, 2018, 3, 509. Issue 2 (7th February 2020)
- Main Title:
- Comment on "A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials" by Hattrick-Simpers, et al., Molecular Systems Design & Engineering, 2018, 3, 509
- Authors:
- Hattrick-Simpers, Jason
DeCost, Brian - Abstract:
- Abstract : Here we start the conversation on reproducibility and openness in materials AI, by comparing two nominally identical modeling workflows. Abstract : In this short comment we present a reproducibility study for our recent manuscript "A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials" by Hattrick-Simpers, et al., Mol. Syst. Des. Eng., 2018, 3, 509" using a suite of open source materials data science tools. The principal goal of this study is to provide the interested reader the ability to reproduce our previous machine learning model with minimal effort and then perform predictions upon the holdout set used in that manuscript. In transcribing our model from the Java-based Magpie / Weka framework to the Python-based Matminer / scikit-learn framework we noticed an unexpected discrepancy in the predictions between the two platforms. To compare the performance of nominally equivalent random forest regression models across these two platforms, we trained and evaluated 50 replicate models for each platform using random 90% subsets of the full hydride training set for each replicate. The Magpie / Weka models showed somewhat higher predicted mean absolute error (5.6 ± 0.4) than the Matminer / scikit-learn models (4.2 ± 0.4) on the holdout set, although the validation statistics were within error of one another. It is beyond the scope of this comment to fully analyze the ultimate source of the variance in these predictions,Abstract : Here we start the conversation on reproducibility and openness in materials AI, by comparing two nominally identical modeling workflows. Abstract : In this short comment we present a reproducibility study for our recent manuscript "A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials" by Hattrick-Simpers, et al., Mol. Syst. Des. Eng., 2018, 3, 509" using a suite of open source materials data science tools. The principal goal of this study is to provide the interested reader the ability to reproduce our previous machine learning model with minimal effort and then perform predictions upon the holdout set used in that manuscript. In transcribing our model from the Java-based Magpie / Weka framework to the Python-based Matminer / scikit-learn framework we noticed an unexpected discrepancy in the predictions between the two platforms. To compare the performance of nominally equivalent random forest regression models across these two platforms, we trained and evaluated 50 replicate models for each platform using random 90% subsets of the full hydride training set for each replicate. The Magpie / Weka models showed somewhat higher predicted mean absolute error (5.6 ± 0.4) than the Matminer / scikit-learn models (4.2 ± 0.4) on the holdout set, although the validation statistics were within error of one another. It is beyond the scope of this comment to fully analyze the ultimate source of the variance in these predictions, but we speculate that some contribution results from differences in how Magpie treats duplicate compositions in the training set and/or differences in RF implementation between Weka and scikit-learn . … (more)
- Is Part Of:
- Molecular Systems Design and Engineering. Volume 5:Issue 2(2020)
- Journal:
- Molecular Systems Design and Engineering
- Issue:
- Volume 5:Issue 2(2020)
- Issue Display:
- Volume 5, Issue 2 (2020)
- Year:
- 2020
- Volume:
- 5
- Issue:
- 2
- Issue Sort Value:
- 2020-0005-0002-0000
- Page Start:
- 589
- Page End:
- 591
- Publication Date:
- 2020-02-07
- Subjects:
- Chemistry -- Molecular aspects -- Periodicals
Chemical engineering -- Molecular aspects -- Periodicals
Nanotechnology -- Periodicals
620.5 - Journal URLs:
- http://pubs.rsc.org/en/journals/journalissues/me#!recentarticles&adv ↗
http://www.rsc.org/ ↗ - DOI:
- 10.1039/c9me00138g ↗
- Languages:
- English
- ISSNs:
- 2058-9689
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.856400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12917.xml