Predicting locally manageable resource failures of high availability clusters. (10th July 2022)
- Record Type:
- Journal Article
- Title:
- Predicting locally manageable resource failures of high availability clusters. (10th July 2022)
- Main Title:
- Predicting locally manageable resource failures of high availability clusters
- Authors:
- Somasekaram, Premathas
Calinescu, Radu - Abstract:
- Abstract: Critical services from domains as diverse as finance, manufacturing and healthcare are often delivered by complex enterprise applications (EAs). High‐availability clusters (HACs) are software‐managed IT infrastructures that enable these EAs to operate with minimum downtime. This article presents a novel Bayesian decision network model to improve the failure detection capabilities of the HACs components using a comprehensive set of characteristics for the analyzed component. The model then combines these characteristics to predict whether the failure of this component can be managed locally at the failed component level without propagating the failure to upper‐level components and causing a complete system failure. By improving the detection capabilities and predicting locally manageable failures, the model improves the decision‐making process of HACs, and has the potential to reduce the downtime and improve availability for the applications protected by HACs. The model uses the capabilities of the Bayesian decision networks, which combines Bayesian networks with the utility theory, to assign weights to different characteristics and consolidate the related variables to output the result. The model evaluation in a realistic testbed environment with three servers, an established HAC and a well‐known EA shows that the model can improve the area under the receiver operating characteristic curve for prediction of locally manageable failures by up to 9.05% compared to theAbstract: Critical services from domains as diverse as finance, manufacturing and healthcare are often delivered by complex enterprise applications (EAs). High‐availability clusters (HACs) are software‐managed IT infrastructures that enable these EAs to operate with minimum downtime. This article presents a novel Bayesian decision network model to improve the failure detection capabilities of the HACs components using a comprehensive set of characteristics for the analyzed component. The model then combines these characteristics to predict whether the failure of this component can be managed locally at the failed component level without propagating the failure to upper‐level components and causing a complete system failure. By improving the detection capabilities and predicting locally manageable failures, the model improves the decision‐making process of HACs, and has the potential to reduce the downtime and improve availability for the applications protected by HACs. The model uses the capabilities of the Bayesian decision networks, which combines Bayesian networks with the utility theory, to assign weights to different characteristics and consolidate the related variables to output the result. The model evaluation in a realistic testbed environment with three servers, an established HAC and a well‐known EA shows that the model can improve the area under the receiver operating characteristic curve for prediction of locally manageable failures by up to 9.05% compared to the baseline HAC results. … (more)
- Is Part Of:
- Software, practice & experience. Volume 52:Number 10(2022)
- Journal:
- Software, practice & experience
- Issue:
- Volume 52:Number 10(2022)
- Issue Display:
- Volume 52, Issue 10 (2022)
- Year:
- 2022
- Volume:
- 52
- Issue:
- 10
- Issue Sort Value:
- 2022-0052-0010-0000
- Page Start:
- 2191
- Page End:
- 2225
- Publication Date:
- 2022-07-10
- Subjects:
- Bayesian networks -- dependability -- high availability -- high availability clusters -- reliability
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.3119 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23394.xml