Cross-project defect prediction using data sampling for class imbalance learning: an empirical study. Issue 2 (4th March 2021)
- Record Type:
- Journal Article
- Title:
- Cross-project defect prediction using data sampling for class imbalance learning: an empirical study. Issue 2 (4th March 2021)
- Main Title:
- Cross-project defect prediction using data sampling for class imbalance learning: an empirical study
- Authors:
- Goel, Lipika
Sharma, Mayank
Khatri, Sunil Kumar
Damodaran, D. - Abstract:
- Abstract : The presence of defect data related to different projects leads to cross-project defect prediction an open issue in the field of research in software engineering. In cross-project defect prediction, the source and the target projects are different. The prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the varying projects leads to a highly imbalanced source dataset. The performance of the predictive model degrades due to this imbalance nature of the dataset. This is termed as the class imbalance problem in machine learning. This paper conducts an empirical analysis in a bi-fold manner. It evaluates whether data sampling techniques can handle the class imbalance problem and improve the performance of the predictive model for cross-project defect prediction (CPDP). Secondly, it also evaluates whether the results of CPDP after data sampling are comparable to within project defect prediction (WPDP). Ensemble learning classifiers are used as the predictive model over 12 publically available object-oriented project datasets. The experimental results infer that SMOTE oversampling can be applied to overcome the problem of class imbalance on CPDP. It also gives comparable results to WPDP with statistical significance. GRAPHICAL ABSTRACT:
- Is Part Of:
- International journal of parallel, emergent and distributed systems. Volume 36:Issue 2(2021)
- Journal:
- International journal of parallel, emergent and distributed systems
- Issue:
- Volume 36:Issue 2(2021)
- Issue Display:
- Volume 36, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 36
- Issue:
- 2
- Issue Sort Value:
- 2021-0036-0002-0000
- Page Start:
- 130
- Page End:
- 143
- Publication Date:
- 2021-03-04
- Subjects:
- Cross project defect prediction (CPDP) -- within project defect prediction (WPDP) -- class imbalance learning -- data sampling -- ensemble learning -- synthetic minority oversampling technique (SMOTE)
Parallel computers -- Periodicals
Electronic data processing -- Distributed processing -- Periodicals
Computer algorithms -- Periodicals
004.35 - Journal URLs:
- http://www.tandfonline.com/toc/gpaa20/current ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/17445760.2019.1650039 ↗
- Languages:
- English
- ISSNs:
- 1744-5760
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.441300
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22947.xml