A PLDA approach for language and text independent speaker recognition. (September 2017)
- Record Type:
- Journal Article
- Title:
- A PLDA approach for language and text independent speaker recognition. (September 2017)
- Main Title:
- A PLDA approach for language and text independent speaker recognition
- Authors:
- Khosravani, Abbas
Homayounpour, Mohammad M. - Abstract:
- Highlights: A language independent PLDA training algorithm has been proposed to improve performance of text-independent speaker recognition under multilingual trial condition. The proposed approach take advantageous of multilingual utterances by bilingual speakers to improve speaker recognition in multilingual scenarios. Source normalization technique which was developed to compensate for speech-source-variation, offered superior performance in cross-language trial condition. The proposed solution can provide significant improvement for non-English trials which makes it an effective technique to adapt a speaker recognition system to a low-resource language. Abstract: There are many factors affecting the variability of an i -vector extracted from a speech segment such as the acoustic content, segment duration, handset type and background noise. The language being spoken is one of the sources of variation which has received limited focus due to the lack of multilingual resources available. Consequently, the discrimination performance is much lower under multilingual trial condition. Standard session-compensation techniques such as Within-Class Covariance Normalization (WCCN), Linear Discriminant Analysis (LDA) and Probabilistic LDA (PLDA) cannot robustly compensate for language source of variation as the amount of data is limited to represent such variability. Source normalization technique which was developed to compensate for speech-source-variation, offered superiorHighlights: A language independent PLDA training algorithm has been proposed to improve performance of text-independent speaker recognition under multilingual trial condition. The proposed approach take advantageous of multilingual utterances by bilingual speakers to improve speaker recognition in multilingual scenarios. Source normalization technique which was developed to compensate for speech-source-variation, offered superior performance in cross-language trial condition. The proposed solution can provide significant improvement for non-English trials which makes it an effective technique to adapt a speaker recognition system to a low-resource language. Abstract: There are many factors affecting the variability of an i -vector extracted from a speech segment such as the acoustic content, segment duration, handset type and background noise. The language being spoken is one of the sources of variation which has received limited focus due to the lack of multilingual resources available. Consequently, the discrimination performance is much lower under multilingual trial condition. Standard session-compensation techniques such as Within-Class Covariance Normalization (WCCN), Linear Discriminant Analysis (LDA) and Probabilistic LDA (PLDA) cannot robustly compensate for language source of variation as the amount of data is limited to represent such variability. Source normalization technique which was developed to compensate for speech-source-variation, offered superior performance in cross-language trials by providing better estimation of within-speaker scatter matrix in WCCN and LDA techniques. However, neither language normalization nor the state-of-the-art PLDA algorithm is capable of modeling language variability on a dataset with insufficient multilingual utterances for each speaker, resulting in a poor performance in cross-language trial condition. This study is an extension to our initial developments of a language-independent PLDA training algorithm which aimed at reducing the effect of language as a source of variability on the performance of speaker recognition. We will provide a thorough analysis of how the proposed approach can utilize multilingual training data from bilingual speakers to robustly compensate for the effect of languages. Evaluated on multilingual trial condition, the proposed solution demonstrated over 10% EER and 13% minimum DCF relative improvement on NIST 2008 speaker recognition evaluation as well as 12.4% EER and 23% minimum DCF on PRISM evaluation set over the baseline system while also providing improvement in other trial conditions. … (more)
- Is Part Of:
- Computer speech & language. Volume 45(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 45(2017)
- Issue Display:
- Volume 45, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 45
- Issue:
- 2017
- Issue Sort Value:
- 2017-0045-2017-0000
- Page Start:
- 457
- Page End:
- 474
- Publication Date:
- 2017-09
- Subjects:
- Speaker recognition -- PLDA -- Language mismatch -- Cross-Language -- Multilingual -- NIST SRE
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.04.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2060.xml