Online anomaly detection for multi‐source VMware using a distributed streaming framework. (11th January 2016)
- Record Type:
- Journal Article
- Title:
- Online anomaly detection for multi‐source VMware using a distributed streaming framework. (11th January 2016)
- Main Title:
- Online anomaly detection for multi‐source VMware using a distributed streaming framework
- Authors:
- Solaimani, Mohiuddin
Iftekhar, Mohammed
Khan, Latifur
Thuraisingham, Bhavani
Ingram, Joe
Seker, Sadi Evren - Abstract:
- Summary: Anomaly detection refers to the identification of patterns in a dataset that do not conform to expected patterns. Such non‐conformant patterns typically correspond to samples of interest and are assigned to different labels in different domains, such as outliers, anomalies, exceptions, and malware. A daunting challenge is to detect anomalies in rapid voluminous streams of data. This paper presents a novel, generic real‐time distributed anomaly detection framework for multi‐source stream data. As a case study, we investigate anomaly detection for a multi‐source VMware‐based cloud data center, which maintains a large number of virtual machines (VMs). This framework continuously monitors VMware performance stream data related to CPU statistics (e.g., load and usage). It collects data simultaneously from all of the VMs connected to the network and notifies the resource manager to reschedule its CPU resources dynamically when it identifies any abnormal behavior from its collected data. A semi‐supervised clustering technique is used to build a model from benign training data only. During testing, if a data instance deviates significantly from the model, then it is flagged as an anomaly. Effective anomaly detection in this case demands a distributed framework with high throughput and low latency. Distributed streaming frameworks like Apache Storm, Apache Spark, S4, and others are designed for a lower data processing time and a higher throughput than standard centralizedSummary: Anomaly detection refers to the identification of patterns in a dataset that do not conform to expected patterns. Such non‐conformant patterns typically correspond to samples of interest and are assigned to different labels in different domains, such as outliers, anomalies, exceptions, and malware. A daunting challenge is to detect anomalies in rapid voluminous streams of data. This paper presents a novel, generic real‐time distributed anomaly detection framework for multi‐source stream data. As a case study, we investigate anomaly detection for a multi‐source VMware‐based cloud data center, which maintains a large number of virtual machines (VMs). This framework continuously monitors VMware performance stream data related to CPU statistics (e.g., load and usage). It collects data simultaneously from all of the VMs connected to the network and notifies the resource manager to reschedule its CPU resources dynamically when it identifies any abnormal behavior from its collected data. A semi‐supervised clustering technique is used to build a model from benign training data only. During testing, if a data instance deviates significantly from the model, then it is flagged as an anomaly. Effective anomaly detection in this case demands a distributed framework with high throughput and low latency. Distributed streaming frameworks like Apache Storm, Apache Spark, S4, and others are designed for a lower data processing time and a higher throughput than standard centralized frameworks. We have experimentally compared the average processing latency of a tuple during clustering and prediction in both Spark and Storm and demonstrated that Spark processes a tuple much quicker than storm on average. Copyright © 2016 John Wiley & Sons, Ltd. … (more)
- Is Part Of:
- Software, practice & experience. Volume 46:Number 11(2016)
- Journal:
- Software, practice & experience
- Issue:
- Volume 46:Number 11(2016)
- Issue Display:
- Volume 46, Issue 11 (2016)
- Year:
- 2016
- Volume:
- 46
- Issue:
- 11
- Issue Sort Value:
- 2016-0046-0011-0000
- Page Start:
- 1479
- Page End:
- 1497
- Publication Date:
- 2016-01-11
- Subjects:
- real‐time anomaly detection -- incremental clustering -- resource scheduling -- data center -- Apache Spark -- Apache Storm
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2390 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 464.xml