R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data. (February 2020)
- Record Type:
- Journal Article
- Title:
- R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data. (February 2020)
- Main Title:
- R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data
- Authors:
- Bania, Rubul Kumar
Halder, Anindya - Abstract:
- Highlights: An effective ensemble attribute selection technique based on rough set concept is proposed for selecting highly relevant, highly significant and less redundant attributes from large medical datasets. A novel n number of set intersection method is also proposed in this context to reduce the biasness during the time of ensemble attribute selection process. kNN imputation method is applied for missing value treatment before selecting the minimal attribute set from a given data. The proposed method is applied on various publicly available real life medical datasets which are collected from UCI machine learning repository. The proposed method is compared with five other state-of-the-art attribute selection methods. Experimental results justify the superiority of the proposed method in terms of different evaluation measures in comparison to other methods. Paired t - test confirms the statistical significance of the better results in favor of the proposed method over other counterpart techniques. Abstract: Background and Objective: Retrieving meaningful information from high dimensional dataset is an important and challenging task. Normally, medical dataset suffers from several issues such as curse of dimensionality problem, uncertainty, presence of missing values, non-relevant and redundant attributes, etc. Any machine learning technique applied on such data (without any preprocessing) by and large takes a considerable amount of computational time and may degrade theHighlights: An effective ensemble attribute selection technique based on rough set concept is proposed for selecting highly relevant, highly significant and less redundant attributes from large medical datasets. A novel n number of set intersection method is also proposed in this context to reduce the biasness during the time of ensemble attribute selection process. kNN imputation method is applied for missing value treatment before selecting the minimal attribute set from a given data. The proposed method is applied on various publicly available real life medical datasets which are collected from UCI machine learning repository. The proposed method is compared with five other state-of-the-art attribute selection methods. Experimental results justify the superiority of the proposed method in terms of different evaluation measures in comparison to other methods. Paired t - test confirms the statistical significance of the better results in favor of the proposed method over other counterpart techniques. Abstract: Background and Objective: Retrieving meaningful information from high dimensional dataset is an important and challenging task. Normally, medical dataset suffers from several issues such as curse of dimensionality problem, uncertainty, presence of missing values, non-relevant and redundant attributes, etc. Any machine learning technique applied on such data (without any preprocessing) by and large takes a considerable amount of computational time and may degrade the performance of the model. Methods: In this article, R-Ensembler, a parameter free greedy ensemble attribute selection method is proposed adopting the concept of rough set theory by using the attribute-class, attribute-significance and attribute-attribute relevance measures to select a subset of attributes which are most relevant, significant and non-redundant from a pool of different attribute subsets in order to predict the presence or absence of different diseases in medical dataset. The main role of the proposed ensembler is to combine multiple subsets of attributes produced by different rough set filters and to produce an optimal subset of attributes for subsequent classification task. A novel n number of set intersection method is also proposed to reduce the biasness during the time of attribute selection process. Before selecting the minimal attribute set from a given data by the proposed R-Ensembler method, the dataset is preprocessed by the k nearest neighbour ( k NN) imputation method for missing value treatment. Results: Experiments are carried out on seven benchmark medical datasets collected from University of California at Irvine (UCI) repository. The performance of the proposed ensemble method is compared with five state-of-the-art attribute selection algorithms, results of which are measured using three benchmark classifiers viz., Naïve Bayes, decision trees and random forest. Experimental results clearly justify the superiority of the proposed R-Ensembler method over other attribute selection algorithms. Results of paired t-test performed on average accuracies produced by different classifiers simulated on the reduced data sets achieved by the proposed and counter part attribute selection methods confirm the statistical significance of the better reduced attribute subsets achieved by the proposed R-Ensembler method compared to others. Conclusion: The proposed ensemble method turned out to be very effective for selecting high relevant, high significant and less redundant attributes from a pool of different subsets of attributes. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 184(2020)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 184(2020)
- Issue Display:
- Volume 184, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 184
- Issue:
- 2020
- Issue Sort Value:
- 2020-0184-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-02
- Subjects:
- Rough set -- kNN Imputation -- Ensemble -- Dependency -- Classification
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2019.105122 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21625.xml