A study of speaker clustering for speaker attribution in large telephone conversation datasets. (November 2016)
- Record Type:
- Journal Article
- Title:
- A study of speaker clustering for speaker attribution in large telephone conversation datasets. (November 2016)
- Main Title:
- A study of speaker clustering for speaker attribution in large telephone conversation datasets
- Authors:
- Ghaemmaghami, Houman
Dean, David
Sridharan, Sridha
van Leeuwen, David A. - Abstract:
- Highlights: Large dataset speaker clustering is more efficient using linkage clustering (O(n log(n)). Need for cluster merging and retraining is eliminated through linkage clustering. Complete-linkage speaker clustering outperforms common retraining-based clustering. Robust stopping criterion by using complete-linkage and cross-likelihood ratio. Robustness of clustering stopping criterion is evaluated on varying datasets. Abstract: This paper proposes the task of speaker attribution as speaker diarization followed by speaker linking. The aim of attribution is to identify and label common speakers across multiple recordings. To do this, it is necessary to first carry out diarization to obtain speaker-homogeneous segments from each recording. Speaker linking can then be conducted to link common speaker identities across multiple inter-session recordings. This process can be extremely inefficient using the traditional agglomerative cluster merging and retraining commonly employed in diarization. We thus propose an attribution system using complete-linkage clustering (CLC) without model retraining. We show that on top of the efficiency gained through elimination of the retraining phase, greater accuracy is achieved by utilizing the farthest-neighbor criterion inherent to CLC for both diarization and linking. We first evaluate the use of CLC against an agglomerative clustering (AC) without retraining approach, traditional agglomerative clustering with retraining (ACR) andHighlights: Large dataset speaker clustering is more efficient using linkage clustering (O(n log(n)). Need for cluster merging and retraining is eliminated through linkage clustering. Complete-linkage speaker clustering outperforms common retraining-based clustering. Robust stopping criterion by using complete-linkage and cross-likelihood ratio. Robustness of clustering stopping criterion is evaluated on varying datasets. Abstract: This paper proposes the task of speaker attribution as speaker diarization followed by speaker linking. The aim of attribution is to identify and label common speakers across multiple recordings. To do this, it is necessary to first carry out diarization to obtain speaker-homogeneous segments from each recording. Speaker linking can then be conducted to link common speaker identities across multiple inter-session recordings. This process can be extremely inefficient using the traditional agglomerative cluster merging and retraining commonly employed in diarization. We thus propose an attribution system using complete-linkage clustering (CLC) without model retraining. We show that on top of the efficiency gained through elimination of the retraining phase, greater accuracy is achieved by utilizing the farthest-neighbor criterion inherent to CLC for both diarization and linking. We first evaluate the use of CLC against an agglomerative clustering (AC) without retraining approach, traditional agglomerative clustering with retraining (ACR) and single-linkage clustering (SLC) for speaker linking. We show that CLC provides a relative improvement of 20%, 29% and 39% in attribution error rate (AER) over the three said approaches, respectively. We then propose a diarization system using CLC and show that it outperforms AC, ACR and SLC with relative improvements of 32%, 50% and 70% in diarization error rate (DER), respectively. In our work, we employ the cross-likelihood ratio (CLR) as the model comparison metric for clustering and investigate its robustness as a stopping criterion for attribution. … (more)
- Is Part Of:
- Computer speech & language. Volume 40(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 40(2016)
- Issue Display:
- Volume 40, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 40
- Issue:
- 2016
- Issue Sort Value:
- 2016-0040-2016-0000
- Page Start:
- 23
- Page End:
- 45
- Publication Date:
- 2016-11
- Subjects:
- Speaker attribution -- Linking -- Diarization -- Complete-linkage clustering -- Joint factor analysis -- Cross-likelihood ratio
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.03.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1262.xml