Efficient and flexible algorithms for monitoring distance-based outliers over data streams. (January 2016)
- Record Type:
- Journal Article
- Title:
- Efficient and flexible algorithms for monitoring distance-based outliers over data streams. (January 2016)
- Main Title:
- Efficient and flexible algorithms for monitoring distance-based outliers over data streams
- Authors:
- Kontaki, Maria
Gounaris, Anastasios
Papadopoulos, Apostolos N.
Tsichlas, Kostas
Manolopoulos, Yannis - Abstract:
- Abstract: Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x . The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life andAbstract: Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x . The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study. Abstract : Highlights: We prove a linear space lower bound. A novel continuous algorithm is presented, which has two versions (COD). To support different views of outliers, we propose an extension (ACOD). We also propose algorithms based on micro-clusters (MCOD/AMCOD). Performance evaluation results based on both real-life and synthetic data. … (more)
- Is Part Of:
- Information systems. Volume 55(2016)
- Journal:
- Information systems
- Issue:
- Volume 55(2016)
- Issue Display:
- Volume 55, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 55
- Issue:
- 2016
- Issue Sort Value:
- 2016-0055-2016-0000
- Page Start:
- 37
- Page End:
- 53
- Publication Date:
- 2016-01
- Subjects:
- Stream data mining -- Outlier detection
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2015.07.006 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9070.xml