Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition. (November 2019)
- Record Type:
- Journal Article
- Title:
- Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition. (November 2019)
- Main Title:
- Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition
- Authors:
- Novotný, Ondřej
Plchot, Oldřich
Glembek, Ondřej
Černocký, Jan "Honza"
Burget, Lukáš - Abstract:
- Highlights: This work presents an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising for robust speaker verification (SV). It includes a detailed performance analysis for different SV system paradigms (i-vectors vs. x-vectors), features and test conditions. Presented results are computed on the public and widely used NIST SRE 2010, 2016, PRISM and SITW. It also presents results achieved with pure autoencoder enhancement, multi-condition PLDA training and their simultaneous use. Best system performance is achieved by combining the DNN autoencoder with x-vectors and multi-condition training in PLDA. Abstract: In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. The target application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentation process to cover a wide range of acoustic conditions and to obtain rich training data for various components of our SV system. We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train a denoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is currently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for a text-independent SV system. We compare results achieved with autoencoder enhancement, multi-condition PLDAHighlights: This work presents an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising for robust speaker verification (SV). It includes a detailed performance analysis for different SV system paradigms (i-vectors vs. x-vectors), features and test conditions. Presented results are computed on the public and widely used NIST SRE 2010, 2016, PRISM and SITW. It also presents results achieved with pure autoencoder enhancement, multi-condition PLDA training and their simultaneous use. Best system performance is achieved by combining the DNN autoencoder with x-vectors and multi-condition training in PLDA. Abstract: In this work, we present an analysis of a DNN-based autoencoder for speech enhancement, dereverberation and denoising. The target application is a robust speaker verification (SV) system. We start our approach by carefully designing a data augmentation process to cover a wide range of acoustic conditions and to obtain rich training data for various components of our SV system. We augment several well-known databases used in SV with artificially noised and reverberated data and we use them to train a denoising autoencoder (mapping noisy and reverberated speech to its clean version) as well as an x-vector extractor which is currently considered as state-of-the-art in SV. Later, we use the autoencoder as a preprocessing step for a text-independent SV system. We compare results achieved with autoencoder enhancement, multi-condition PLDA training and their simultaneous use. We present a detailed analysis with various conditions of NIST SRE 2010, 2016, PRISM and with re-transmitted data. We conclude that the proposed preprocessing can significantly improve both i-vector and x-vector baselines and that this technique can be used to build a robust SV system for various target domains. … (more)
- Is Part Of:
- Computer speech & language. Volume 58(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 58(2019)
- Issue Display:
- Volume 58, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 58
- Issue:
- 2019
- Issue Sort Value:
- 2019-0058-2019-0000
- Page Start:
- 403
- Page End:
- 421
- Publication Date:
- 2019-11
- Subjects:
- Speaker verification -- Signal enhancement -- Autoencoder -- Neural network -- Robustness -- Embedding
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.06.004 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11148.xml