Distributed learning on 20 000+ lung cancer patients – The Personal Health Train. (March 2020)
- Record Type:
- Journal Article
- Title:
- Distributed learning on 20 000+ lung cancer patients – The Personal Health Train. (March 2020)
- Main Title:
- Distributed learning on 20 000+ lung cancer patients – The Personal Health Train
- Authors:
- Deist, Timo M.
Dankers, Frank J.W.M.
Ojha, Priyanka
Scott Marshall, M.
Janssen, Tomas
Faivre-Finn, Corinne
Masciocchi, Carlotta
Valentini, Vincenzo
Wang, Jiazhou
Chen, Jiayan
Zhang, Zhen
Spezi, Emiliano
Button, Mick
Jan Nuyttens, Joost
Vernhout, René
van Soest, Johan
Jochems, Arthur
Monshouwer, René
Bussink, Johan
Price, Gareth
Lambin, Philippe
Dekker, Andre - Abstract:
- Highlights: Machine learning without sharing patient data, quickly and at scale. Connecting 8 oncology institutes from 5 countries. Survival prediction modelling on 20 000+ patient cases. A study completed in 4 months. Abstract: Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. Results: In 4 months, we connected databasesHighlights: Machine learning without sharing patient data, quickly and at scale. Connecting 8 oncology institutes from 5 countries. Survival prediction modelling on 20 000+ patient cases. A study completed in 4 months. Abstract: Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy. … (more)
- Is Part Of:
- Radiotherapy and oncology. Volume 144(2020)
- Journal:
- Radiotherapy and oncology
- Issue:
- Volume 144(2020)
- Issue Display:
- Volume 144, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 144
- Issue:
- 2020
- Issue Sort Value:
- 2020-0144-2020-0000
- Page Start:
- 189
- Page End:
- 200
- Publication Date:
- 2020-03
- Subjects:
- Lung cancer -- Big data -- Distributed learning -- Federated learning -- Machine learning -- Survival analysis -- Prediction modeling -- FAIR data
Oncology -- Periodicals
Radiotherapy -- Periodicals
Tumors -- Periodicals
Medical Oncology -- Periodicals
Neoplasms -- radiotherapy -- Periodicals
Radiotherapy -- Periodicals
Radiothérapie -- Périodiques
Cancérologie -- Périodiques
Tumeurs -- Périodiques
Electronic journals
616.9940642 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01678140 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/01678140 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/01678140 ↗
http://www.estro.org/ ↗
http://www.elsevier.com/journals ↗
http://www.journals.elsevier.com/radiotherapy-and-oncology/ ↗ - DOI:
- 10.1016/j.radonc.2019.11.019 ↗
- Languages:
- English
- ISSNs:
- 0167-8140
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 7240.790000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22460.xml