Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition. (March 2015)
- Record Type:
- Journal Article
- Title:
- Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition. (March 2015)
- Main Title:
- Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition
- Authors:
- You, Chang Huai
Li, Haizhou
Lee, Kong Aik - Abstract:
- Abstract: This paper studies the relevance factor in maximum a posteriori (MAP) adaptation of Gaussian mixture model (GMM) for speaker and language recognition. Knowing that relevance factor determines how much the observed training data influence the model adaptation, thus the resulting GMM model, it is believed that more effective modeling can be achieved if the relevance factor is adaptive to the corresponding data. We therefore provide a mathematic derivation for the estimation of relevance factor. GMM supervector support vector machine (SVM) with nuisance attribute projection (NAP) (GMM–NAP–SVM) has been reported to be effective and reliable for speaker and language recognition. Being a discriminative classifier in nature, a GMM–NAP–SVM system is sensitive to the magnitude and direction of a supervector in the high dimensional space. However, when characterizing a speech utterance with GMM supervector estimated through MAP, we observe that the resulting supervector is undesirably affected by the varying duration of the utterance. We propose an adaptive relevance factor that adapts to the duration to mitigate the variability effect due to the length of utterance. We give a systematic investigation on different types of relevance factor of MAP in different applicatively platforms. We show the efficacy of the data-dependent as well as adaptive relevance factors on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2008 andAbstract: This paper studies the relevance factor in maximum a posteriori (MAP) adaptation of Gaussian mixture model (GMM) for speaker and language recognition. Knowing that relevance factor determines how much the observed training data influence the model adaptation, thus the resulting GMM model, it is believed that more effective modeling can be achieved if the relevance factor is adaptive to the corresponding data. We therefore provide a mathematic derivation for the estimation of relevance factor. GMM supervector support vector machine (SVM) with nuisance attribute projection (NAP) (GMM–NAP–SVM) has been reported to be effective and reliable for speaker and language recognition. Being a discriminative classifier in nature, a GMM–NAP–SVM system is sensitive to the magnitude and direction of a supervector in the high dimensional space. However, when characterizing a speech utterance with GMM supervector estimated through MAP, we observe that the resulting supervector is undesirably affected by the varying duration of the utterance. We propose an adaptive relevance factor that adapts to the duration to mitigate the variability effect due to the length of utterance. We give a systematic investigation on different types of relevance factor of MAP in different applicatively platforms. We show the efficacy of the data-dependent as well as adaptive relevance factors on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2008 and language recognition evaluation (LRE) 2009 and 2011 tasks respectively. … (more)
- Is Part Of:
- Computer speech & language. Volume 30(2015)
- Journal:
- Computer speech & language
- Issue:
- Volume 30(2015)
- Issue Display:
- Volume 30, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 30
- Issue:
- 2015
- Issue Sort Value:
- 2015-0030-2015-0000
- Page Start:
- 116
- Page End:
- 134
- Publication Date:
- 2015-03
- Subjects:
- Maximum a posteriori -- Supervector -- Gaussian mixture model -- Support vector machine
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2014.09.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5423.xml