A speaker verification backend with robust performance across conditions. (January 2022)
- Record Type:
- Journal Article
- Title:
- A speaker verification backend with robust performance across conditions. (January 2022)
- Main Title:
- A speaker verification backend with robust performance across conditions
- Authors:
- Ferrer, Luciana
McLaren, Mitchell
Brümmer, Niko - Abstract:
- Abstract: In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential — the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-boxAbstract: In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to result in systems that work poorly on conditions different from those used to train the calibration model. We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs. The backend is trained discriminatively to optimize binary cross-entropy. When trained on a number of diverse datasets that are labeled only with respect to speaker, the proposed backend consistently and, in some cases, dramatically improves calibration, compared to the standard PLDA approach, on a number of held-out datasets, some of which are markedly different from the training data. Discrimination performance is also consistently improved. We show that joint training of the PLDA and the adaptive calibrator is essential — the same benefits cannot be achieved when freezing PLDA and fine-tuning the calibrator. To our knowledge, the results in this paper are the first evidence in the literature that it is possible to develop a speaker verification system with robust out-of-the-box performance on a large variety of conditions. Highlights: A standard backend for speaker verification is based on PLDA followed by global calibration. We propose a novel backend that consists of a PLDA-like formulation followed by condition-aware calibration. The backend is trained discriminatively to optimize binary cross-entropy on a large variety of conditions. We show that this backend results in significant gains over the standard PLDA approach. The proposed approach results in a system that can be used out-of-the-box on data from different conditions. … (more)
- Is Part Of:
- Computer speech & language. Volume 71(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 71(2022)
- Issue Display:
- Volume 71, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 71
- Issue:
- 2022
- Issue Sort Value:
- 2022-0071-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Speaker verification -- Probabilistic linear discriminant analysis -- Robust calibration
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101258 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19058.xml