A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. (12th February 2022)
- Record Type:
- Journal Article
- Title:
- A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. (12th February 2022)
- Main Title:
- A method for machine learning generation of realistic synthetic datasets for validating healthcare applications
- Authors:
- Arvanitis, Theodoros N
White, Sean
Harrison, Stuart
Chaplin, Rupert
Despotou, George - Abstract:
- Digital health applications can improve quality and effectiveness of healthcare, by offering a number of new tools to users, which are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, needing large datasets to test them in realistic clinical scenarios. Access to datasets is challenging, due to patient privacy concerns. Development of synthetic datasets is seen as a potential alternative. The objective of the paper is the development of a method for the generation of realistic synthetic datasets, statistically equivalent to real clinical datasets, and demonstrate that the Generative Adversarial Network (GAN) based approach is fit for purpose. A generative adversarial network was implemented and trained, in a series of six experiments, using numerical and categorical variables, including ICD-9 and laboratory codes, from three clinically relevant datasets. A number of contextual steps provided the success criteria for the synthetic dataset. A synthetic dataset that exhibits very similar statistical characteristics with the real dataset was generated. Pairwise association of variables is very similar. A high degree of Jaccard similarity and a successful K-S test further support this. The proof of concept of generating realistic synthetic datasets was successful, with the approach showing promise for further work.
- Is Part Of:
- Health informatics journal. Volume 28:Number 2(2022)
- Journal:
- Health informatics journal
- Issue:
- Volume 28:Number 2(2022)
- Issue Display:
- Volume 28, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 28
- Issue:
- 2
- Issue Sort Value:
- 2022-0028-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02-12
- Subjects:
- Generative adversarial networks -- certification -- machine learning -- realistic synthetic datasets -- safety
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://jhi.sagepub.com/ ↗
http://www.uk.sagepub.com/home.nav ↗ - DOI:
- 10.1177/14604582221077000 ↗
- Languages:
- English
- ISSNs:
- 1460-4582
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24600.xml