NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition. (May 2020)
- Record Type:
- Journal Article
- Title:
- NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition. (May 2020)
- Main Title:
- NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition
- Authors:
- Lee, Kong Aik
Yamamoto, Hitoshi
Okabe, Koji
Wang, Qiongqiong
Guo, Ling
Koshinaka, Takafumi
Zhang, Jiacen
Shinoda, Koichi - Abstract:
- Abstract: This paper describes the NEC-TT speaker recognition system designed for the 2018 Speaker Recognition Evaluation (SRE'18) benchmarking. The NEC-TT submission was among the best-performing systems in this latest edition of SRE organized by the National Institute of Standards and Technology (NIST). It comprises multiple sub-systems based on a deep speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) back-end. Speaker embeddings are continuous-valued vector representations that allow easy comparison between speaker voices with simple geometric operations. The effectiveness of deep speaker embeddings relies on the quantity and diversity of the training data. To this end, we hinge on data augmentation and mixed-bandwidth training strategies to increase the number of training examples and speakers. By doing so, we not only increase the quantity of the training data but also expand the output softmax layer with a larger number of speaker classes. From a system design perspective, we adopted a two-stage pipeline consisting of a general multi-domain speaker embedding front-end followed by a domain-specific PLDA back-end. This has a significant benefit in commercial deployment since the same speaker embedding front-end could be used with multiple domain-adapted PLDA back-ends to cater to every specific deployment. This paper provides a detailed description and analysis of the design methodology, data augmentation, bandwidth extension,Abstract: This paper describes the NEC-TT speaker recognition system designed for the 2018 Speaker Recognition Evaluation (SRE'18) benchmarking. The NEC-TT submission was among the best-performing systems in this latest edition of SRE organized by the National Institute of Standards and Technology (NIST). It comprises multiple sub-systems based on a deep speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) back-end. Speaker embeddings are continuous-valued vector representations that allow easy comparison between speaker voices with simple geometric operations. The effectiveness of deep speaker embeddings relies on the quantity and diversity of the training data. To this end, we hinge on data augmentation and mixed-bandwidth training strategies to increase the number of training examples and speakers. By doing so, we not only increase the quantity of the training data but also expand the output softmax layer with a larger number of speaker classes. From a system design perspective, we adopted a two-stage pipeline consisting of a general multi-domain speaker embedding front-end followed by a domain-specific PLDA back-end. This has a significant benefit in commercial deployment since the same speaker embedding front-end could be used with multiple domain-adapted PLDA back-ends to cater to every specific deployment. This paper provides a detailed description and analysis of the design methodology, data augmentation, bandwidth extension, multi-head attention, PLDA adaptation, and other components that have contributed to good performance in NEC-TT's SRE'18 results. … (more)
- Is Part Of:
- Computer speech & language. Volume 61(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 61(2020)
- Issue Display:
- Volume 61, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 61
- Issue:
- 2020
- Issue Sort Value:
- 2020-0061-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-05
- Subjects:
- Speaker recognition -- benchmark evaluation -- domain adaptation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.101033 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12564.xml