Nonparametrically trained PLDA for short duration i-vector speaker verification. (November 2018)
- Record Type:
- Journal Article
- Title:
- Nonparametrically trained PLDA for short duration i-vector speaker verification. (November 2018)
- Main Title:
- Nonparametrically trained PLDA for short duration i-vector speaker verification
- Authors:
- Khosravani, Abbas
Homayounpour, Mohammad M. - Abstract:
- Highlights: We proposed to estimate PLDA model parameters to compute verification score for a pair of i -vectors representing a trial using nearest neighbor (NN) technique rather than the standard MLE. The proposed nearest neighbor PLDA (NN-PLDA) technique is inspired by the recent success of the nonparametric discriminant analysis (NDA) in speaker recognition. We aim at providing analysis on the proposed NN-PLDA model as well as introducing a duration variability modeling technique in the estimation of within-speaker scatter matrix so as to compensate for the effect of limited speech data. This ability of NN-PLDA makes it more suitable for application domains where the distribution of data is non-Gaussian or the nonlinear transformation induced by length-normalization is inappropriate or results in loss of information. The proposed duration variability modeling technique in NN-PLDA leads to significant performance improvement in core–10sec and 10sec–10sec conditions of NIST SRE10 as well as the truncated extended core condition relative to G-PLDA. We also conducted another experiment on the more recent SRE'16 and found that the proposed method can generalize well to this evaluation protocol. We also found out that the proposed NN-PLDA could better exploit the unlabeled development data to provide a more adapted model to the evaluation data. Our main contributions will be to show that modeling between- and within-speaker variability using a nonparametric form for each targetHighlights: We proposed to estimate PLDA model parameters to compute verification score for a pair of i -vectors representing a trial using nearest neighbor (NN) technique rather than the standard MLE. The proposed nearest neighbor PLDA (NN-PLDA) technique is inspired by the recent success of the nonparametric discriminant analysis (NDA) in speaker recognition. We aim at providing analysis on the proposed NN-PLDA model as well as introducing a duration variability modeling technique in the estimation of within-speaker scatter matrix so as to compensate for the effect of limited speech data. This ability of NN-PLDA makes it more suitable for application domains where the distribution of data is non-Gaussian or the nonlinear transformation induced by length-normalization is inappropriate or results in loss of information. The proposed duration variability modeling technique in NN-PLDA leads to significant performance improvement in core–10sec and 10sec–10sec conditions of NIST SRE10 as well as the truncated extended core condition relative to G-PLDA. We also conducted another experiment on the more recent SRE'16 and found that the proposed method can generalize well to this evaluation protocol. We also found out that the proposed NN-PLDA could better exploit the unlabeled development data to provide a more adapted model to the evaluation data. Our main contributions will be to show that modeling between- and within-speaker variability using a nonparametric form for each target speaker leads to notable gains in speaker verification accuracy of both long and short duration utterances. Abstract: The duration of speech segments can significantly impact the performance of text-independent speaker verification systems. In real world applications which require high accuracy on short utterances, the performance of i -vector speaker verification framework degrades significantly considering that i -vectors extracted from short utterances are less reliable (i.e., uncertainty is higher) than those extracted from long utterances. Therefore, to handle duration variability properly, a more realistic approach seems to be required. This study is an extension to our recently proposed nearest neighbor probabilistic linear discriminant analysis (NN-PLDA) which estimates the parameters of PLDA in i -vector speaker verification framework using a nonparametric form rather than maximum likelihood estimation (MLE) obtained by an EM algorithm, and has been shown to provide superior performance. In NN-PLDA, the between-speaker covariance matrix that represents global information about the speaker variability is replaced with a local estimation computed on a nearest neighbor basis for each target speaker. Compared to their parametric counterparts, the nonparametric between- and within-speaker scatter matrices can better exploit the discriminant information in training data and are more adapted to sample distributions. In this paper, we provide further analysis on the proposed nonparametrically trained PLDA as well as introduce a duration variability modeling technique in the estimation of the within-speaker scatter matrix as to compensate for the effect of limited speech data. We evaluate our approach using core–10sec and 10sec–10sec telephone trial conditions of NIST 2010 SRE as well as on the truncated test utterances in extended core condition with duration less than 10 s. We also present the results obtained by the successful incorporation of NN-PLDA on the recent NIST 2016 speaker recognition evaluation. In all experiments, considerable performance improvement is obtained with the proposed technique compared to a generatively trained PLDA model. … (more)
- Is Part Of:
- Computer speech & language. Volume 52(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 52(2018)
- Issue Display:
- Volume 52, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 52
- Issue:
- 2018
- Issue Sort Value:
- 2018-0052-2018-0000
- Page Start:
- 105
- Page End:
- 122
- Publication Date:
- 2018-11
- Subjects:
- Speaker recognition -- PLDA -- Nonparametric -- NIST SRE -- Short duration -- i-Vector
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.12.009 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17055.xml