Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization. (September 2017)

Record Type:: Journal Article
Title:: Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization. (September 2017)
Main Title:: Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization
Authors:: Lapidot, Itshak
Shoa, Alon
Furmanov, Tal
Aminov, Lidiya
Moyal, Ami
Bonastre, Jean-François
Abstract:: Highlights: We show that this imbalance can be regularized and optimized together. We also show that it can be regularized and optimized not only in the probabilistic framework, but for any additive distortion. The advantage of the HDM in its regularization and flexibility capabilities are presented on the speaker diarization task. Abstract: Speaker diarization is a problem of separating unknown speakers in a conversation into homogeneous parts in the speaker sense. State-of-the-art diarization systems are based on i-vector methodologies. However, these approaches require large quantities of training data, which must be obtained from an environment that is similar to that of the conversation being diarized. In this paper we present a diarization system that does not require such training data but instead can suffice with some development data for parameter-tuning. This system is a generalization of the well-known hidden Markov model (HMM), a popular clustering algorithm trained by Viterbi statistics. Our proposed model, referred to as a hidden distortion model (HDM), is based on state distortion models and transition costs, for which probabilistic calculations are not mandatory, in contrast to the case of HMM. We provide a mathematical basis for our approach, and we demonstrate that Viterbi-based HMM can be seen as a special case of HDM. This proximity allows us to apply similar approaches for state-model training when the new paradigm is used to learn sequence dependencies. … (more)
Is Part Of:: Computer speech & language. Volume 45(2017)
Journal:: Computer speech & language
Issue:: Volume 45(2017)
Issue Display:: Volume 45, Issue 2017 (2017)
Year:: 2017
Volume:: 45
Issue:: 2017
Issue Sort Value:: 2017-0045-2017-0000
Page Start:: 1
Page End:: 20
Publication Date:: 2017-09
Subjects:: Time-series clustering -- Hidden Markov model (HMM) -- Hidden-distortion-model (HDM) -- Speaker diarization
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2017.01.011 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 2060.xml