Exact memory–constrained UPGMA for large scale speaker clustering. (November 2019)
- Record Type:
- Journal Article
- Title:
- Exact memory–constrained UPGMA for large scale speaker clustering. (November 2019)
- Main Title:
- Exact memory–constrained UPGMA for large scale speaker clustering
- Authors:
- Cumani, Sandro
Laface, Pietro - Abstract:
- Highlights: We focus on exact hierarchical clustering of large sets of utterances. Hierarchical clustering is challenging due to memory constraints. We propose an efficient, exact and parallel implementation of UPGMA clustering. We extend the Clustering Features concept to speaker recognition scoring functions. We assess the efficiency of our method on datasets including 4 million utterances. Abstract: This work focuses on clustering large sets of utterances collected from an unknown number of speakers. Since the number of speakers is unknown, we focus on exact hierarchical agglomerative clustering, followed by automatic selection of the number of clusters. Exact hierarchical clustering of a large number of vectors, however, is a challenging task due to memory constraints, which make it ineffective or unfeasible for large datasets. We propose an exact memory–constrained and parallel implementation of average linkage clustering for large scale datasets, showing that its computational complexity is approximately O ( N 2 ), but is much faster (up to 40 times in our experiments), than the Reciprocal Nearest Neighbor chain algorithm, which has O ( N 2 ) complexity. We also propose a very fast silhouette computation procedure that, in linear time, determines the set of clusters. The computational efficiency of our approach is demonstrated on datasets including up to 4 million speaker vectors.
- Is Part Of:
- Pattern recognition. Volume 95(2019:Nov.)
- Journal:
- Pattern recognition
- Issue:
- Volume 95(2019:Nov.)
- Issue Display:
- Volume 95 (2019)
- Year:
- 2019
- Volume:
- 95
- Issue Sort Value:
- 2019-0095-0000-0000
- Page Start:
- 235
- Page End:
- 246
- Publication Date:
- 2019-11
- Subjects:
- Clustering -- UPGMA -- Similarity measures -- Reciprocal Nearest Neighbor -- PLDA -- PSVM -- Silhouette -- Cluster quality measures
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.06.018 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11157.xml