Predictive maintenance of infrastructure code using "fluid" datasets: An exploratory study on Ansible defect proneness. Issue 11 (14th June 2022)
- Record Type:
- Journal Article
- Title:
- Predictive maintenance of infrastructure code using "fluid" datasets: An exploratory study on Ansible defect proneness. Issue 11 (14th June 2022)
- Main Title:
- Predictive maintenance of infrastructure code using "fluid" datasets: An exploratory study on Ansible defect proneness
- Authors:
- Quattrocchi, Giovanni
Tamburri, Damian Andrew - Other Names:
- Miranda Breno guestEditor.
Tuya Javier guestEditor.
Garrido Alejandra guestEditor. - Abstract:
- Abstract: This work consolidates and compounds previous investigations in recognizing defects for infrastructure‐as‐code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models—we call this notion a fluid dataset . More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto‐training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language—as a de facto standard for industrial‐strength infrastructure code—we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive–Bayes classifier. On the one hand, by improving state‐of‐the‐art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure‐as‐code. On the other hand, we have barely scratched the surface on the novel approach of fluid‐datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future. Abstract : This work is an investigation on detecting defects for infrastructure‐as‐code (IaC) with a focus onAbstract: This work consolidates and compounds previous investigations in recognizing defects for infrastructure‐as‐code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models—we call this notion a fluid dataset . More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto‐training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language—as a de facto standard for industrial‐strength infrastructure code—we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive–Bayes classifier. On the one hand, by improving state‐of‐the‐art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure‐as‐code. On the other hand, we have barely scratched the surface on the novel approach of fluid‐datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future. Abstract : This work is an investigation on detecting defects for infrastructure‐as‐code (IaC) with a focus on the creation of datasets. We experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto‐training facilities for model retraining and redeployment. We build defect prediction models and manage to improve on the state‐of‐the‐art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive Bayes classifier. … (more)
- Is Part Of:
- Journal of software. Volume 34:Issue 11(2022)
- Journal:
- Journal of software
- Issue:
- Volume 34:Issue 11(2022)
- Issue Display:
- Volume 34, Issue 11 (2022)
- Year:
- 2022
- Volume:
- 34
- Issue:
- 11
- Issue Sort Value:
- 2022-0034-0011-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-06-14
- Subjects:
- defect prediction -- DevOps -- fluid datasets -- infrastructure code
Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.2480 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24241.xml