Comparative assessment of automated algorithms for the separation of one-dimensional Gaussian mixtures. (2022)
- Record Type:
- Journal Article
- Title:
- Comparative assessment of automated algorithms for the separation of one-dimensional Gaussian mixtures. (2022)
- Main Title:
- Comparative assessment of automated algorithms for the separation of one-dimensional Gaussian mixtures
- Authors:
- Lötsch, Jörn
Malkusch, Sebastian
Ultsch, Alfred - Abstract:
- Abstract: Motivation: Gaussian mixture models (GMMs) are probabilistic models commonly used in biomedical research to detect subgroup structures in data sets with one-dimensional information. Reliable model parameterization requires that the number of modes, i.e., states of the generating process, is known. However, this is rarely the case for empirically measured biomedical data. Several implementations are available that estimate GMM parameters differently. This work aims to provide a comparative evaluation of automated GMM fitting methods. Results and conclusions: The performance of commonly used algorithms for automatic parameterization and mode number determination was compared with respect to reproducing the ground truth of generated data derived from multiple normal distributions. Four main variants of Gaussian mode number detection algorithms and five variants of GMM parameter estimation methods were tested in a combinatory scenario. The combination of best performing mode number determination algorithms and GMM parameter estimation methods was then tested on artificial and real-live data sets known to display a GMM structure. None of the tested methods correctly determined the underlying data structure consistently. The likelihood ratio test had the best performance in identifying the mode number associated with the best GMM fit of the data distribution while the Markov chain Monte Carlo (MCMC) algorithm was best for GMM parameter estimation while. The combinationAbstract: Motivation: Gaussian mixture models (GMMs) are probabilistic models commonly used in biomedical research to detect subgroup structures in data sets with one-dimensional information. Reliable model parameterization requires that the number of modes, i.e., states of the generating process, is known. However, this is rarely the case for empirically measured biomedical data. Several implementations are available that estimate GMM parameters differently. This work aims to provide a comparative evaluation of automated GMM fitting methods. Results and conclusions: The performance of commonly used algorithms for automatic parameterization and mode number determination was compared with respect to reproducing the ground truth of generated data derived from multiple normal distributions. Four main variants of Gaussian mode number detection algorithms and five variants of GMM parameter estimation methods were tested in a combinatory scenario. The combination of best performing mode number determination algorithms and GMM parameter estimation methods was then tested on artificial and real-live data sets known to display a GMM structure. None of the tested methods correctly determined the underlying data structure consistently. The likelihood ratio test had the best performance in identifying the mode number associated with the best GMM fit of the data distribution while the Markov chain Monte Carlo (MCMC) algorithm was best for GMM parameter estimation while. The combination of the two methods of number determination algorithms and GMM parameter estimation was consistently among the best and overall outperformed the available implementations. Implementation: An automated tool for the detection of GMM based structures in (biomedical) datasets was created based on the present results and made freely available in the R library "opGMMassessment" at https://cran.r-project.org/package=opGMMassessment . … (more)
- Is Part Of:
- Informatics in medicine unlocked. Volume 34(2023)
- Journal:
- Informatics in medicine unlocked
- Issue:
- Volume 34(2023)
- Issue Display:
- Volume 34, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 34
- Issue:
- 2023
- Issue Sort Value:
- 2023-0034-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2022
- Subjects:
- Data science -- Machine learning -- Data structure detection -- Biomedical informatics
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/23529148/ ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.imu.2022.101113 ↗
- Languages:
- English
- ISSNs:
- 2352-9148
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24461.xml