A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses. (20th December 2017)
- Record Type:
- Journal Article
- Title:
- A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses. (20th December 2017)
- Main Title:
- A distributed parallel algorithm for inferring hierarchical groups from large‐scale text corpuses
- Authors:
- Seshadri, Karthick
S. Mercy, Shalinie
Manohar, Sidharth - Abstract:
- Summary: We propose a distributed parallel algorithm for inferring the hierarchical groups present in a large‐scale text corpus. The algorithm is designed to deal with corpuses that typically do not fit into the main memory of a workstation computer. The key contribution of this paper lies in its proposal and verification of a parallel distributed algorithm that exploits the advantages of two complementary techniques based on (i) localized modularity optimization and (ii) spectral clustering. Based on our experimental observations, these are complementary in the sense that the former excels at finding coarse groups in a large‐scale network, while the latter demands a heavy memory footprint but is effective in inferring tightly knit fine‐grained groups. Empirical evaluation of the distributed implementation scheme shows that the algorithm exhibits a significant speed‐up when compared to existing algorithms like Louvain and, at the same time, produces better quality clusters than either Louvain or spectral clustering algorithms in terms of the F‐score and Rand index.
- Is Part Of:
- Concurrency and computation. Volume 30:Number 11(2018)
- Journal:
- Concurrency and computation
- Issue:
- Volume 30:Number 11(2018)
- Issue Display:
- Volume 30, Issue 11 (2018)
- Year:
- 2018
- Volume:
- 30
- Issue:
- 11
- Issue Sort Value:
- 2018-0030-0011-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2017-12-20
- Subjects:
- distributed algorithm -- hierarchical clustering -- large‐scale clustering -- message passing interface -- spectral clustering -- text clustering
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.4404 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 6615.xml