A fast DBSCAN algorithm for big data based on efficient density calculation. (1st October 2022)
- Record Type:
- Journal Article
- Title:
- A fast DBSCAN algorithm for big data based on efficient density calculation. (1st October 2022)
- Main Title:
- A fast DBSCAN algorithm for big data based on efficient density calculation
- Authors:
- Hanafi, Nooshin
Saadatfar, Hamid - Abstract:
- Highlights: Proposing a fast and accurate version of DBSCAN algorithm for big data. Calculating density based on a small subset of data called operational set. Creating and updating the operational set at a very low computational cost. Comprehensive evaluation of the proposed method in compared to other recent works. Abstract: Today, data is being generated with a high speed. Managing large volume of data has become a challenge in the current age. Clustering is a method to analyze data that is generated in the Internet. Various approaches have been presented for data clustering until now. Among them, DBSCAN is a most well-known density-based clustering algorithm. This algorithm can detect clusters of different shapes and does not require prior knowledge about the number of clusters. A major part of the DBSCAN run-time is spent to calculate the distance of data from each other to find the neighbors of each sample in the dataset. The time complexity of this algorithm is O(n 2 ); Therefore, it is not suitable for processing big datasets. In this paper, DBSCAN is improved so that it can be applied to big datasets. The proposed method calculates accurately each sample density based on a reduced set of data. This reduced set is called the operational set. This collection is updated periodically. The use of local samples to calculate the density has greatly reduced the computational cost of clustering. The empirical results on various datasets of different sizes and dimensions showHighlights: Proposing a fast and accurate version of DBSCAN algorithm for big data. Calculating density based on a small subset of data called operational set. Creating and updating the operational set at a very low computational cost. Comprehensive evaluation of the proposed method in compared to other recent works. Abstract: Today, data is being generated with a high speed. Managing large volume of data has become a challenge in the current age. Clustering is a method to analyze data that is generated in the Internet. Various approaches have been presented for data clustering until now. Among them, DBSCAN is a most well-known density-based clustering algorithm. This algorithm can detect clusters of different shapes and does not require prior knowledge about the number of clusters. A major part of the DBSCAN run-time is spent to calculate the distance of data from each other to find the neighbors of each sample in the dataset. The time complexity of this algorithm is O(n 2 ); Therefore, it is not suitable for processing big datasets. In this paper, DBSCAN is improved so that it can be applied to big datasets. The proposed method calculates accurately each sample density based on a reduced set of data. This reduced set is called the operational set. This collection is updated periodically. The use of local samples to calculate the density has greatly reduced the computational cost of clustering. The empirical results on various datasets of different sizes and dimensions show that the proposed algorithm increases the clustering speed compared to recent related works while having similar accuracy as the original DBSCAN algorithm. … (more)
- Is Part Of:
- Expert systems with applications. Volume 203(2022)
- Journal:
- Expert systems with applications
- Issue:
- Volume 203(2022)
- Issue Display:
- Volume 203, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 203
- Issue:
- 2022
- Issue Sort Value:
- 2022-0203-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-10-01
- Subjects:
- Data Mining -- Clustering -- Big Data -- DBSCAN Algorithm
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2022.117501 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21800.xml