A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray. (March 2019)
- Record Type:
- Journal Article
- Title:
- A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray. (March 2019)
- Main Title:
- A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray
- Authors:
- Hosseini, Behrooz
Kiani, Kourosh - Abstract:
- Abstract: This paper introduces a distributed density based clustering approach that benefits from the hesitant fuzzy weighted correlation coefficient as its similarity measure. All the proposed clustering steps adapt completely with the distributed Spark computation framework and the data dependency reaches its least possible using the density reachability concept. By proposing a Resilient Distributed Dataset (RDD) localized subclustering method, disk I/O burden of the MapReduce based clustering approaches has been solved as well. The comparison of the clustering results with similar works shows the superiority of the proposed algorithm in precision and cluster validity indexes. The proposed method also shows relative robustness to the presence of noise in comparison with similar recent works. The proposed approach shows better precision and validity index in comparison with MapReduce base algorithms while outperforming MapReduce based approaches in computational burden. The scalability of the proposed method is almost similar to the standard Spark machine learning library. Highlights: A new hesitant fuzzy weighted similarity measurement for gene expression. A novel density based soft clustering on the basis of Apache Spark computational model. Flexible and robust to intrinsic noise with reasonable clustering results. Parallelism with least serial bottleneck completely adapting to Spark framework. A scalable method which is efficient and suitable for processing huge volumesAbstract: This paper introduces a distributed density based clustering approach that benefits from the hesitant fuzzy weighted correlation coefficient as its similarity measure. All the proposed clustering steps adapt completely with the distributed Spark computation framework and the data dependency reaches its least possible using the density reachability concept. By proposing a Resilient Distributed Dataset (RDD) localized subclustering method, disk I/O burden of the MapReduce based clustering approaches has been solved as well. The comparison of the clustering results with similar works shows the superiority of the proposed algorithm in precision and cluster validity indexes. The proposed method also shows relative robustness to the presence of noise in comparison with similar recent works. The proposed approach shows better precision and validity index in comparison with MapReduce base algorithms while outperforming MapReduce based approaches in computational burden. The scalability of the proposed method is almost similar to the standard Spark machine learning library. Highlights: A new hesitant fuzzy weighted similarity measurement for gene expression. A novel density based soft clustering on the basis of Apache Spark computational model. Flexible and robust to intrinsic noise with reasonable clustering results. Parallelism with least serial bottleneck completely adapting to Spark framework. A scalable method which is efficient and suitable for processing huge volumes of big data. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 79(2019)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 79(2019)
- Issue Display:
- Volume 79, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 79
- Issue:
- 2019
- Issue Sort Value:
- 2019-0079-2019-0000
- Page Start:
- 100
- Page End:
- 113
- Publication Date:
- 2019-03
- Subjects:
- Distributed data clustering -- Big data analytics -- Gene expression -- Apache spark -- Density based clustering -- Hesitant fuzzy decision making
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2019.01.006 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9458.xml