A fast and noise resilient cluster-based anomaly detection. Issue 1 (February 2017)
- Record Type:
- Journal Article
- Title:
- A fast and noise resilient cluster-based anomaly detection. Issue 1 (February 2017)
- Main Title:
- A fast and noise resilient cluster-based anomaly detection
- Authors:
- Bigdeli, Elnaz
Mohammadi, Mahdi
Raahemi, Bijan
Matwin, Stan - Abstract:
- Abstract Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as "normal" or "anomaly." Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them againstAbstract Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as "normal" or "anomaly." Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods. … (more)
- Is Part Of:
- Pattern analysis and applications. Volume 20:Issue 1(2017:Feb.)
- Journal:
- Pattern analysis and applications
- Issue:
- Volume 20:Issue 1(2017:Feb.)
- Issue Display:
- Volume 20, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 20
- Issue:
- 1
- Issue Sort Value:
- 2017-0020-0001-0000
- Page Start:
- 183
- Page End:
- 199
- Publication Date:
- 2017-02
- Subjects:
- Anomaly detection -- Arbitrary shape clustering -- Gaussian Mixture Model -- Distribution distance
Pattern recognition systems -- Periodicals
Pattern perception -- Periodicals
006.4 - Journal URLs:
- http://link.springer.com/journal/10044 ↗
http://www.springer.com/gb/ ↗ - DOI:
- 10.1007/s10044-015-0484-0 ↗
- Languages:
- English
- ISSNs:
- 1433-7541
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 6412.980451
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10001.xml