An automated process for supporting decisions in clustering-based data analysis. (June 2022)
- Record Type:
- Journal Article
- Title:
- An automated process for supporting decisions in clustering-based data analysis. (June 2022)
- Main Title:
- An automated process for supporting decisions in clustering-based data analysis
- Authors:
- Bernabé-Díaz, José Antonio
Franco, Manuel
Vivo, Juana-María
Quesada-Martínez, Manuel
Fernández-Breis, Jesualdo T. - Abstract:
- Highlights: Our approach analyzes the reliability of clusters generated on quantitative metrics. We use statistical properties: stability and goodness of classifications. Our method helps to select the most useful metrics for analyzing the datasets. Our method permits to account for the heterogeneity in datasets. The usefulness of our approach is illustrated in three use cases. Abstract: Background and objective: Metrics are commonly used by biomedical researchers and practitioners to measure and evaluate properties of individuals, instruments, models, methods, or datasets. Due to the lack of a standardized validation procedure for a metric, it is assumed that if a metric is appropriate for analyzing a dataset in a certain domain, then it will be appropriate for other datasets in the same domain. However, such generalizability cannot be taken for granted, since the behavior of a metric can vary in different scenarios. The study of such behavior of a metric is the objective of this paper, since it would allow for assessing its reliability before drawing any conclusion about biomedical datasets. Methods: We present a method to support in evaluating the behavior of quantitative metrics on datasets. Our approach assesses a metric by using clustering-based data analysis, and enhancing the decision-making process in the optimal classification. Our method assesses the metrics by applying two important criteria of the unsupervised classification validation that are calculated on theHighlights: Our approach analyzes the reliability of clusters generated on quantitative metrics. We use statistical properties: stability and goodness of classifications. Our method helps to select the most useful metrics for analyzing the datasets. Our method permits to account for the heterogeneity in datasets. The usefulness of our approach is illustrated in three use cases. Abstract: Background and objective: Metrics are commonly used by biomedical researchers and practitioners to measure and evaluate properties of individuals, instruments, models, methods, or datasets. Due to the lack of a standardized validation procedure for a metric, it is assumed that if a metric is appropriate for analyzing a dataset in a certain domain, then it will be appropriate for other datasets in the same domain. However, such generalizability cannot be taken for granted, since the behavior of a metric can vary in different scenarios. The study of such behavior of a metric is the objective of this paper, since it would allow for assessing its reliability before drawing any conclusion about biomedical datasets. Methods: We present a method to support in evaluating the behavior of quantitative metrics on datasets. Our approach assesses a metric by using clustering-based data analysis, and enhancing the decision-making process in the optimal classification. Our method assesses the metrics by applying two important criteria of the unsupervised classification validation that are calculated on the clusterings generated by the metric, namely stability and goodness of the clusters. The application of our method is facilitated to biomedical researchers by our evaluomeR tool. Results: The analytical power of our methods is shown in the results of the application of our method to analyze (1) the behavior of the impact factor metric for a series of journal categories; (2) which structural metrics provide a better partitioning of the content of a repository of biomedical ontologies, and (3) the heterogeneity sources in effect size metrics of biomedical primary studies. Conclusions: The use of statistical properties such as stability and goodness of classifications allows for a useful analysis of the behavior of quantitative metrics, which can be used for supporting decisions about which metrics to apply on a certain dataset. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 219(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 219(2022)
- Issue Display:
- Volume 219, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 219
- Issue:
- 2022
- Issue Sort Value:
- 2022-0219-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Evaluation metrics -- Clustering-based data analysis -- Unsupervised classification -- Structural metrics -- Meta-analysis
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2022.106765 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22281.xml