Feature-space SVM adaptation for speaker adapted word prominence detection. (January 2019)
- Record Type:
- Journal Article
- Title:
- Feature-space SVM adaptation for speaker adapted word prominence detection. (January 2019)
- Main Title:
- Feature-space SVM adaptation for speaker adapted word prominence detection
- Authors:
- Schnall, Andrea
Heckmann, Martin - Abstract:
- Highlights: New method for speaker adaptation in a prosodic prominence detection framework. Discriminative adaptation method which takes the properties of an SVM-based classifier into account. Additional regularization via Gaussanity and sparseness constraints. Detailed evaluation also in comparison to a DNN which analysis the contribution of the different regularization terms. Notable improvements of the proposed method compared to standard approaches. Abstract: Prosodic cues such as the word prominence play a fundamental role in human communication, e.g., to express important information. Since different speakers use a wide variety of features to express prominence, there is a large difference in performance between speaker dependently and speaker independently trained models. To cope with these variations without training a new speaker dependent model, in speech recognition speaker adaptation techniques such as feature-space Maximum Likelihood Linear Regression (fMLLR) turned out to be very useful. These methods are developed for GMM-HMM based classifiers under the assumption that the data can be well modeled via the mixture of a few Gaussian distributions. However, in many cases these assumptions are too restrictive. In particular a discriminative classifier such as an SVM often yields far superior results to a GMM. Therefore, we propose a new adaptation method, which adapts the data to the radial basis function kernel of the SVM. To avoid overfitting we apply twoHighlights: New method for speaker adaptation in a prosodic prominence detection framework. Discriminative adaptation method which takes the properties of an SVM-based classifier into account. Additional regularization via Gaussanity and sparseness constraints. Detailed evaluation also in comparison to a DNN which analysis the contribution of the different regularization terms. Notable improvements of the proposed method compared to standard approaches. Abstract: Prosodic cues such as the word prominence play a fundamental role in human communication, e.g., to express important information. Since different speakers use a wide variety of features to express prominence, there is a large difference in performance between speaker dependently and speaker independently trained models. To cope with these variations without training a new speaker dependent model, in speech recognition speaker adaptation techniques such as feature-space Maximum Likelihood Linear Regression (fMLLR) turned out to be very useful. These methods are developed for GMM-HMM based classifiers under the assumption that the data can be well modeled via the mixture of a few Gaussian distributions. However, in many cases these assumptions are too restrictive. In particular a discriminative classifier such as an SVM often yields far superior results to a GMM. Therefore, we propose a new adaptation method, which adapts the data to the radial basis function kernel of the SVM. To avoid overfitting we apply two regularization terms. The first is based on fMLLR and the second is an L 1 regularization to enforce a sparse transformation matrix. We analyze the method in the context of speaker adaptation for word prominence detection, with varying amounts of adaptation data and different weights of the regularization terms. We show that our novel method clearly outperforms fMLLR-GMM and fMLLR-SVM based adaptation. … (more)
- Is Part Of:
- Computer speech & language. Volume 53(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 53(2019)
- Issue Display:
- Volume 53, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 53
- Issue:
- 2019
- Issue Sort Value:
- 2019-0053-2019-0000
- Page Start:
- 198
- Page End:
- 216
- Publication Date:
- 2019-01
- Subjects:
- Prosody -- Speaker adaptation -- FMLLR -- SVM -- Prominence,
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2018.06.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7651.xml