Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification. (September 2017)
- Record Type:
- Journal Article
- Title:
- Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification. (September 2017)
- Main Title:
- Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification
- Authors:
- Li, Na
Mak, Man-Wai
Lin, Wei-Wei
Chien, Jen-Tzung - Abstract:
- Highlights: Model SNR and duration variability of i-vectors in discriminative subspaces. Use variational Bayesian methods to infer the latent variable model that defines the SNR and duration subspaces. Perform better than PLDA, SNR-invariant PLDA and PLDA with uncertainty propagation on long test utterances. Abstract: Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via aHighlights: Model SNR and duration variability of i-vectors in discriminative subspaces. Use variational Bayesian methods to infer the latent variable model that defines the SNR and duration subspaces. Perform better than PLDA, SNR-invariant PLDA and PLDA with uncertainty propagation on long test utterances. Abstract: Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in NIST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances. … (more)
- Is Part Of:
- Computer speech & language. Volume 45(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 45(2017)
- Issue Display:
- Volume 45, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 45
- Issue:
- 2017
- Issue Sort Value:
- 2017-0045-2017-0000
- Page Start:
- 83
- Page End:
- 103
- Publication Date:
- 2017-09
- Subjects:
- Speaker verification -- Duration variation -- SNR mismatch -- Variational Bayes -- I-vector -- PLDA
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.04.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2060.xml