The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models. Issue 9 (22nd April 2019)
- Record Type:
- Journal Article
- Title:
- The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models. Issue 9 (22nd April 2019)
- Main Title:
- The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models
- Authors:
- Kaya, Aydin
Keceli, Ali Seydi
Catal, Cagatay
Tekinerdogan, Bedir - Editors:
- Maltesque, Guest Editors
- Other Names:
- Ampatzoglou Apostolos guestEditor.
Fontana Francesca Arcelli guestEditor.
Palomba Fabio guestEditor.
Walter Bartosz guestEditor. - Abstract:
- Abstract: Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text‐based features are more useful, and ensemble‐based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance. Abstract : In this study, effect of data balancing on the performance of software vulnerability prediction models is investigated. Four balancing methods and seven classification algorithmsAbstract: Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text‐based features are more useful, and ensemble‐based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance. Abstract : In this study, effect of data balancing on the performance of software vulnerability prediction models is investigated. Four balancing methods and seven classification algorithms are examined. Software metrics, text‐mining based features, and their combination are tested for all the models. Key findings: highly unbalanced software vulnerability prediction datasets should be balanced to accurately estimate the vulnerability‐prone software components. The software metric features are more useful to define the vulnerable samples, and the ensemble‐based classifiers provide better results. … (more)
- Is Part Of:
- Journal of software. Volume 31:Issue 9(2019)
- Journal:
- Journal of software
- Issue:
- Volume 31:Issue 9(2019)
- Issue Display:
- Volume 31, Issue 9 (2019)
- Year:
- 2019
- Volume:
- 31
- Issue:
- 9
- Issue Sort Value:
- 2019-0031-0009-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2019-04-22
- Subjects:
- classification models -- data sampling -- imbalance datasets -- machine learning -- performance analysis -- software vulnerability prediction
Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.2164 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11872.xml