Application of data augmentation techniques towards metabolomics. (September 2022)
- Record Type:
- Journal Article
- Title:
- Application of data augmentation techniques towards metabolomics. (September 2022)
- Main Title:
- Application of data augmentation techniques towards metabolomics
- Authors:
- Moreno-Barea, Francisco J.
Franco, Leonardo
Elizondo, David
Grootveld, Martin - Abstract:
- Abstract: Niemann–Pick Class 1 (NPC1) disease is a rare and debilitating neurodegenerative lysosomal storage disease (LSD). Metabolomics datasets of NPC1 patients available to perform this type of analysis are often limited in the number of samples and severely unbalanced. In order to improve the predictive capability and identify new biomarkers in an NPC1 disease urinary dataset, data augmentation (DA) techniques based on computational intelligence have been employed to create synthetic samples, i.e. the addition of noise, oversampling techniques and conditional generative adversarial networks. These techniques have been used to evaluate their predictive capacities on a set of urine samples donated by 13 untreated NPC1 disease and 47 heterozygous (parental) carrier control participants. Results on the prediction have also been obtained using different machine learning classification models and the partial least squares techniques. These results provide strong evidence for the ability of DA techniques to generate good quality synthetic data. Results acquired show increases in sensitivity of 20%–50%, an F1 score of 6%–30%, and a predictive capacity of 0.3 (out of 1). Additionally, more conventional forms of multivariate data analysis have been employed. These have allowed the detection of unusual urinary metabolite profiles, and the identification of biomarkers through the use of synthetically augmented datasets. Results indicate that urinary branched-chain amino acids suchAbstract: Niemann–Pick Class 1 (NPC1) disease is a rare and debilitating neurodegenerative lysosomal storage disease (LSD). Metabolomics datasets of NPC1 patients available to perform this type of analysis are often limited in the number of samples and severely unbalanced. In order to improve the predictive capability and identify new biomarkers in an NPC1 disease urinary dataset, data augmentation (DA) techniques based on computational intelligence have been employed to create synthetic samples, i.e. the addition of noise, oversampling techniques and conditional generative adversarial networks. These techniques have been used to evaluate their predictive capacities on a set of urine samples donated by 13 untreated NPC1 disease and 47 heterozygous (parental) carrier control participants. Results on the prediction have also been obtained using different machine learning classification models and the partial least squares techniques. These results provide strong evidence for the ability of DA techniques to generate good quality synthetic data. Results acquired show increases in sensitivity of 20%–50%, an F1 score of 6%–30%, and a predictive capacity of 0.3 (out of 1). Additionally, more conventional forms of multivariate data analysis have been employed. These have allowed the detection of unusual urinary metabolite profiles, and the identification of biomarkers through the use of synthetically augmented datasets. Results indicate that urinary branched-chain amino acids such as valine, 3-aminoisobutyrate and quinolinate, may be employable as valuable biomarkers for the diagnosis and prognostic monitoring of NPC1 disease. Highlights: Niemann–Pick type C is a very rare neurodegenerative lysosomal storage disease. Niemann–Pick type C metabolomics datasets are often scarce, containing few samples. Data Augmentation techniques were applied to create additional synthetic samples. Prediction performance shows a significant improvement in sensitivity (20%–50%). DA techniques allow the identification of relevant urinary metabolomics biomarkers. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 148(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 148(2022)
- Issue Display:
- Volume 148, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 148
- Issue:
- 2022
- Issue Sort Value:
- 2022-0148-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Data augmentation -- Machine learning -- Metabolomics -- Niemann–Pick type C disease -- Rare diseases
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105916 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23692.xml