Dolphin-political optimized tversky index based feature selection in spark architecture for clustering big data. (February 2023)
- Record Type:
- Journal Article
- Title:
- Dolphin-political optimized tversky index based feature selection in spark architecture for clustering big data. (February 2023)
- Main Title:
- Dolphin-political optimized tversky index based feature selection in spark architecture for clustering big data
- Authors:
- Chander, Satish
Vijaya, P.
Fernandes, Roshan
Rodrigues, Anisha P
R, Maheswari - Abstract:
- Highlights: The goal is to design a new method for clustering big data with developed DPO. The proposed data clustering technique are carried out in the spark architecture that contains master and the slave nodes for accomplishing the clustering tasks. Here, feature selection and the data augmentation process are carried out at slave nodes, whereas the clustering tasks are performed at the master nodes, respectively. The input big data are large sized data and hence are required to partition the input data into different blocks with varying size. The slave nodes acquire the input partitioned data and choose features from data. Abstract: Background: The analytics of big data has gained immense attention over the classical data-processing techniques which engaged in mining the hidden patterns from huge data known as big data. For relieving the computational complexity, the clustering technique is considered as an imperative part. Methods: This paper presents an optimization-driven technique, namely Dolphin Political Optimizer (DPO) for clustering big data with the Spark model. The developed clustering model is devised using spark architecture that contains master and slave nodes for accomplishing the clustering tasks. The input big data are large-sized data and hence are required to partition the input data into different blocks with varying size. The selection of features is done using the proposed Tversky-based DPO, which is obtained by incorporating the Tversky index in DPOHighlights: The goal is to design a new method for clustering big data with developed DPO. The proposed data clustering technique are carried out in the spark architecture that contains master and the slave nodes for accomplishing the clustering tasks. Here, feature selection and the data augmentation process are carried out at slave nodes, whereas the clustering tasks are performed at the master nodes, respectively. The input big data are large sized data and hence are required to partition the input data into different blocks with varying size. The slave nodes acquire the input partitioned data and choose features from data. Abstract: Background: The analytics of big data has gained immense attention over the classical data-processing techniques which engaged in mining the hidden patterns from huge data known as big data. For relieving the computational complexity, the clustering technique is considered as an imperative part. Methods: This paper presents an optimization-driven technique, namely Dolphin Political Optimizer (DPO) for clustering big data with the Spark model. The developed clustering model is devised using spark architecture that contains master and slave nodes for accomplishing the clustering tasks. The input big data are large-sized data and hence are required to partition the input data into different blocks with varying size. The selection of features is done using the proposed Tversky-based DPO, which is obtained by incorporating the Tversky index in DPO in the slave node. Here, the proposed Dolphin Political Optimizer (DPO) is devised by combining Dolphin Echolocation (DE) and Political Optimizer (PO) respectively. The data augmentation is done using oversampling. The clustering of big data is done using entropy weighting power k-Means clustering where in the weight is updated using proposed DPO algorithm. Result: The assessment of the proposed DPO is done using clustering accuracy, Jaccard coefficient, rand coefficient, Silhouette coefficient.The proposed DPO outperformed with the highest clustering accuracy of 0.937, Jaccard coefficient of 0.670, Rand coefficient of 0.851, and the highest silhouette coefficient of 0.769. Conclusion: This approach demonstrated improved robustness and produced the world's best optimal solution. When comparing with existing methods the proposed Tversky-based DPO offered effective performance. … (more)
- Is Part Of:
- Advances in engineering software. Volume 176(2023)
- Journal:
- Advances in engineering software
- Issue:
- Volume 176(2023)
- Issue Display:
- Volume 176, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 176
- Issue:
- 2023
- Issue Sort Value:
- 2023-0176-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-02
- Subjects:
- Big data clustering -- Tversky index -- Spark model -- Entropy weighting power k-Means clustering -- Data augmentation
Computer-aided engineering -- Periodicals
Engineering -- Computer programs -- Periodicals
Engineering -- Software -- Periodicals
Periodicals
620.0028553 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09659978 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.advengsoft.2022.103331 ↗
- Languages:
- English
- ISSNs:
- 0965-9978
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0705.450000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25302.xml