A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. (November 2018)
- Record Type:
- Journal Article
- Title:
- A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. (November 2018)
- Main Title:
- A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data
- Authors:
- Chen, Yewang
Tang, Shengyu
Bouguila, Nizar
Wang, Cheng
Du, Jixiang
Li, HaiLin - Abstract:
- Highlights: The underlying idea is: point p and point q should have similar neighbors, provided p and q are close to each other; given a certain eps, the closer they are, the more similar their neighbors are. NQ-DBSCAN is an exact algorithm that may return the same result as DBSCAN if the parameters are same. While ρ -Approximate DBSCAN is an approximate algorithm. The best complexity of NQ-DBSCAN can be O(n), and the average complexity of NQ-DBSCAN is proved to be O(n log(n)) provided the parameters are properly chosen. While ρ -Approximate DBSCAN runs only in O(n 2 ) in high dimension. NQ-DBSCAN is suitable for clustering data with a lot of noise. Abstract: Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is "optimal" for large scale data. For example, DBSCAN requires O ( n 2 ) time, Fast-DBSCAN only works well in 2 dimensions, and ρ -Approximate DBSCAN runs in O ( n ) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that ρ -Approximate DBSCAN degenerates to an O ( n 2 ) algorithm in very high dimension such that 2 D > > n . In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named asHighlights: The underlying idea is: point p and point q should have similar neighbors, provided p and q are close to each other; given a certain eps, the closer they are, the more similar their neighbors are. NQ-DBSCAN is an exact algorithm that may return the same result as DBSCAN if the parameters are same. While ρ -Approximate DBSCAN is an approximate algorithm. The best complexity of NQ-DBSCAN can be O(n), and the average complexity of NQ-DBSCAN is proved to be O(n log(n)) provided the parameters are properly chosen. While ρ -Approximate DBSCAN runs only in O(n 2 ) in high dimension. NQ-DBSCAN is suitable for clustering data with a lot of noise. Abstract: Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is "optimal" for large scale data. For example, DBSCAN requires O ( n 2 ) time, Fast-DBSCAN only works well in 2 dimensions, and ρ -Approximate DBSCAN runs in O ( n ) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that ρ -Approximate DBSCAN degenerates to an O ( n 2 ) algorithm in very high dimension such that 2 D > > n . In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O ( n * log ( n )) with the help of indexing technique, and the best case is O ( n ) if proper parameters are used, which makes it suitable for many realtime data. … (more)
- Is Part Of:
- Pattern recognition. Volume 83(2018:Nov.)
- Journal:
- Pattern recognition
- Issue:
- Volume 83(2018:Nov.)
- Issue Display:
- Volume 83 (2018)
- Year:
- 2018
- Volume:
- 83
- Issue Sort Value:
- 2018-0083-0000-0000
- Page Start:
- 375
- Page End:
- 387
- Publication Date:
- 2018-11
- Subjects:
- DBSCAN -- ρ-Approximate DBSCAN -- NQ-DBSCAN
00-01 -- 99-00
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2018.05.030 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16620.xml