Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. (November 2017)
- Record Type:
- Journal Article
- Title:
- Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. (November 2017)
- Main Title:
- Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models
- Authors:
- Zeinali, Hossein
Sameti, Hossein
Burget, Lukáš
Černocký, Jan "Honza" - Abstract:
- Highlights: Performance of DNNs trained on 16 kHz and 8 kHz data is compared in the text-dependent speaker verification task. Performances of different DNNs configurations (namely numbers of senones used and DNN targets) are compared in the text-dependent speaker verification task. Investigation into Bottleneck Alignment (BNA) is added. Beside Imposter-Correct trials, results on Target-Wrong trials are newly included as this trial type is very important in text-dependent speaker verification. In addition to RSR2015, all results are reported also on RedDots data. Abstract: Inspired by the success of Deep Neural Networks (DNN) in text-independent speaker recognition, we have recently demonstrated that similar ideas can also be applied to the text-dependent speaker verification task. In this paper, we describe new advances with our state-of-the-art i-vector based approach to text-dependent speaker verification, which also makes use of different DNN techniques. In order to collect sufficient statistics for i-vector extraction, different frame alignment models are compared such as GMMs, phonemic HMMs or DNNs trained for senone classification. We also experiment with DNN based bottleneck features and their combinations with standard MFCC features. We experiment with few different DNN configurations and investigate the importance of training DNNs on 16 kHz speech. The results are reported on RSR2015 dataset, where training material is available for all possible enrollment and testHighlights: Performance of DNNs trained on 16 kHz and 8 kHz data is compared in the text-dependent speaker verification task. Performances of different DNNs configurations (namely numbers of senones used and DNN targets) are compared in the text-dependent speaker verification task. Investigation into Bottleneck Alignment (BNA) is added. Beside Imposter-Correct trials, results on Target-Wrong trials are newly included as this trial type is very important in text-dependent speaker verification. In addition to RSR2015, all results are reported also on RedDots data. Abstract: Inspired by the success of Deep Neural Networks (DNN) in text-independent speaker recognition, we have recently demonstrated that similar ideas can also be applied to the text-dependent speaker verification task. In this paper, we describe new advances with our state-of-the-art i-vector based approach to text-dependent speaker verification, which also makes use of different DNN techniques. In order to collect sufficient statistics for i-vector extraction, different frame alignment models are compared such as GMMs, phonemic HMMs or DNNs trained for senone classification. We also experiment with DNN based bottleneck features and their combinations with standard MFCC features. We experiment with few different DNN configurations and investigate the importance of training DNNs on 16 kHz speech. The results are reported on RSR2015 dataset, where training material is available for all possible enrollment and test phrases. Additionally, we report results also on more challenging RedDots dataset, where the system is built in truly phrase-independent way. … (more)
- Is Part Of:
- Computer speech & language. Volume 46(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 46(2017)
- Issue Display:
- Volume 46, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 46
- Issue:
- 2017
- Issue Sort Value:
- 2017-0046-2017-0000
- Page Start:
- 53
- Page End:
- 71
- Publication Date:
- 2017-11
- Subjects:
- Deep Neural Network -- Text-dependent -- Speaker verification -- i-Vector -- Frame alignment -- Bottleneck features
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.04.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2908.xml