Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization. (September 2017)
- Record Type:
- Journal Article
- Title:
- Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization. (September 2017)
- Main Title:
- Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization
- Authors:
- Lapidot, Itshak
Shoa, Alon
Furmanov, Tal
Aminov, Lidiya
Moyal, Ami
Bonastre, Jean-François - Abstract:
- Highlights: We show that this imbalance can be regularized and optimized together. We also show that it can be regularized and optimized not only in the probabilistic framework, but for any additive distortion. The advantage of the HDM in its regularization and flexibility capabilities are presented on the speaker diarization task. Abstract: Speaker diarization is a problem of separating unknown speakers in a conversation into homogeneous parts in the speaker sense. State-of-the-art diarization systems are based on i-vector methodologies. However, these approaches require large quantities of training data, which must be obtained from an environment that is similar to that of the conversation being diarized. In this paper we present a diarization system that does not require such training data but instead can suffice with some development data for parameter-tuning. This system is a generalization of the well-known hidden Markov model (HMM), a popular clustering algorithm trained by Viterbi statistics. Our proposed model, referred to as a hidden distortion model (HDM), is based on state distortion models and transition costs, for which probabilistic calculations are not mandatory, in contrast to the case of HMM. We provide a mathematical basis for our approach, and we demonstrate that Viterbi-based HMM can be seen as a special case of HDM. This proximity allows us to apply similar approaches for state-model training when the new paradigm is used to learn sequence dependencies.Highlights: We show that this imbalance can be regularized and optimized together. We also show that it can be regularized and optimized not only in the probabilistic framework, but for any additive distortion. The advantage of the HDM in its regularization and flexibility capabilities are presented on the speaker diarization task. Abstract: Speaker diarization is a problem of separating unknown speakers in a conversation into homogeneous parts in the speaker sense. State-of-the-art diarization systems are based on i-vector methodologies. However, these approaches require large quantities of training data, which must be obtained from an environment that is similar to that of the conversation being diarized. In this paper we present a diarization system that does not require such training data but instead can suffice with some development data for parameter-tuning. This system is a generalization of the well-known hidden Markov model (HMM), a popular clustering algorithm trained by Viterbi statistics. Our proposed model, referred to as a hidden distortion model (HDM), is based on state distortion models and transition costs, for which probabilistic calculations are not mandatory, in contrast to the case of HMM. We provide a mathematical basis for our approach, and we demonstrate that Viterbi-based HMM can be seen as a special case of HDM. This proximity allows us to apply similar approaches for state-model training when the new paradigm is used to learn sequence dependencies. We carry out diarizations of two-speaker telephone conversations in order to evaluate the performance of HDM. When applied to conversations from the LDC CALLHOME database, HDM improves on the performance of a baseline HMM system by about 26% (relative improvement). Moreover, when applied to the NIST 2005 database, it yields a small improvement over the HMM system. … (more)
- Is Part Of:
- Computer speech & language. Volume 45(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 45(2017)
- Issue Display:
- Volume 45, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 45
- Issue:
- 2017
- Issue Sort Value:
- 2017-0045-2017-0000
- Page Start:
- 1
- Page End:
- 20
- Publication Date:
- 2017-09
- Subjects:
- Time-series clustering -- Hidden Markov model (HMM) -- Hidden-distortion-model (HDM) -- Speaker diarization
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.01.011 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2060.xml