Neural adversarial learning for speaker recognition. (November 2019)
- Record Type:
- Journal Article
- Title:
- Neural adversarial learning for speaker recognition. (November 2019)
- Main Title:
- Neural adversarial learning for speaker recognition
- Authors:
- Chien, Jen-Tzung
Peng, Kang-Ting - Abstract:
- Highlights: A series of neural adversarial learning methods are proposed for speaker recognition. Deep learning is implemented for PLDA speaker recognition with i -vectors. Adversarial manifold learning is proposed to build a subspace model with neighbor embeddings in latent variables. Multiobjective learning is performed for minimax optimization. Adversarial augmentation learning is proposed to compensate the imbalanced data problem in speaker recognition. Abstract: This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i -vectors. The first task aims to reduce the dimension of i -vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i -vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i -vectors to balance the size of training utterances across different speakers. AnHighlights: A series of neural adversarial learning methods are proposed for speaker recognition. Deep learning is implemented for PLDA speaker recognition with i -vectors. Adversarial manifold learning is proposed to build a subspace model with neighbor embeddings in latent variables. Multiobjective learning is performed for minimax optimization. Adversarial augmentation learning is proposed to compensate the imbalanced data problem in speaker recognition. Abstract: This paper presents the adversarial learning approaches to deal with various tasks in speaker recognition based on probabilistic discriminant analysis (PLDA) which is seen as a latent variable model for reconstruction of i -vectors. The first task aims to reduce the dimension of i -vectors based on an adversarial manifold learning where the adversarial neural networks of generator and discriminator are merged to preserve neighbor embedding of i -vectors in a low-dimensional space. The generator is trained to fool the discriminator with the generated samples in latent space. A PLDA subspace model is constructed by jointly minimizing a PLDA reconstruction error, a manifold loss for neighbor embedding and an adversarial loss caused by the generator and discriminator. The second task of adversarial learning is developed to tackle the imbalanced data problem. A PLDA based generative adversarial network is trained to generate new i -vectors to balance the size of training utterances across different speakers. An adversarial augmentation learning is proposed for robust speaker recognition. In particular, the minimax optimization is performed to estimate a generator and a discriminator where the class conditional i -vectors produced by generator could not be distinguished from real i -vectors via discriminator. A multiobjective learning is realized for a specialized neural model with the cosine similarity between real and fake i -vectors as well as the regularization for Gaussianity. Experiments are conducted to show the merit of adversarial learning in subspace construction and data augmentation for PLDA-based speaker recognition. … (more)
- Is Part Of:
- Computer speech & language. Volume 58(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 58(2019)
- Issue Display:
- Volume 58, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 58
- Issue:
- 2019
- Issue Sort Value:
- 2019-0058-2019-0000
- Page Start:
- 422
- Page End:
- 440
- Publication Date:
- 2019-11
- Subjects:
- Probabilistic linear discriminant analysis -- Adversarial learning -- Manifold learning -- Data augmentation -- Speaker recognition
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.06.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11148.xml