An empirical comparison of validation methods for software prediction models. Issue 8 (13th June 2021)
- Record Type:
- Journal Article
- Title:
- An empirical comparison of validation methods for software prediction models. Issue 8 (13th June 2021)
- Main Title:
- An empirical comparison of validation methods for software prediction models
- Authors:
- Ali, Asad
Gravino, Carmine - Abstract:
- Abstract: Model validation methods (e.g., k‐ fold cross‐validation) use historical data to predict how well an estimation technique (e.g., random forest) performs on the current (or future) data. Studies in the contexts of software development effort estimation (SDEE) and software fault prediction (SFP) have used and investigated different model validation methods. However, no conclusive indications to suggest which model validation method has a major impact on the prediction accuracy and stability of estimation techniques. Some studies have investigated model validation methods specific to data about either SDEE or SFP. To the best of our knowledge, there is no study in the literature, which has employed different validation methods both with SDEE and SFP data. The aim of this paper is to consider different methods (10) from the family of cross‐validation (CV) and bootstrap validation methods to identify which one contributes to obtaining a better prediction accuracy for both types of data. We also evaluate which model validation methods allow the estimation techniques to provide stable performances (i.e., with lower variance). To this aim, we present an empirical study involving six datasets from the domain of SDEE and six datasets from the SFP domain. The results reveal that repeated 10‐fold CV with SDEE and optimistic boot with SFP data are the model validation methods that provide a better prediction accuracy in a greater number of experiments than the other modelAbstract: Model validation methods (e.g., k‐ fold cross‐validation) use historical data to predict how well an estimation technique (e.g., random forest) performs on the current (or future) data. Studies in the contexts of software development effort estimation (SDEE) and software fault prediction (SFP) have used and investigated different model validation methods. However, no conclusive indications to suggest which model validation method has a major impact on the prediction accuracy and stability of estimation techniques. Some studies have investigated model validation methods specific to data about either SDEE or SFP. To the best of our knowledge, there is no study in the literature, which has employed different validation methods both with SDEE and SFP data. The aim of this paper is to consider different methods (10) from the family of cross‐validation (CV) and bootstrap validation methods to identify which one contributes to obtaining a better prediction accuracy for both types of data. We also evaluate which model validation methods allow the estimation techniques to provide stable performances (i.e., with lower variance). To this aim, we present an empirical study involving six datasets from the domain of SDEE and six datasets from the SFP domain. The results reveal that repeated 10‐fold CV with SDEE and optimistic boot with SFP data are the model validation methods that provide a better prediction accuracy in a greater number of experiments than the other model validation methods. Furthermore, a model validation method can improve the prediction accuracy up to 60% with SDEE data and up to 36% when employing SFP data. The analysis also reveals that repeated fivefold CV produces more stable performances when the experiments are repeated on the same data. Abstract : Investigating cross‐validation (CV) and bootstrap validation methods in software development effort estimation (SDEE) and software fault prediction (SFP). Ten‐fold CV with SDEE and optimistic boot with SFP data provide better predictions in a greater number of experiments than the other methods. A validation method can improve prediction accuracy up to 60% with SDEE data and up to 36% with SFP data, while repeated fivefold CV produces more stable performances when experiments are repeated on the same data. … (more)
- Is Part Of:
- Journal of software. Volume 33:Issue 8(2021)
- Journal:
- Journal of software
- Issue:
- Volume 33:Issue 8(2021)
- Issue Display:
- Volume 33, Issue 8 (2021)
- Year:
- 2021
- Volume:
- 33
- Issue:
- 8
- Issue Sort Value:
- 2021-0033-0008-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2021-06-13
- Subjects:
- model validation methods -- software development efforts estimation -- software faults prediction
Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.2367 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17837.xml