Regularization of neural network model with distance metric learning for i-vector based spoken language identification. (July 2017)
- Record Type:
- Journal Article
- Title:
- Regularization of neural network model with distance metric learning for i-vector based spoken language identification. (July 2017)
- Main Title:
- Regularization of neural network model with distance metric learning for i-vector based spoken language identification
- Authors:
- Lu, Xugang
Shen, Peng
Tsao, Yu
Kawai, Hisashi - Abstract:
- Highlights: Pair-wise distance metric learning was designed on the feature transform layers. Stacking a soft-max layer on the feature transform layers as a classifier layer. Learning the coupled functions in feature extraction and classification layers with a regularized objective function. Experiments showed significant improvements compared to conventional regularization algorithms. Abstract: The i-vector representation and modeling technique has been successfully applied in spoken language identification (SLI). The advantage of using the i-vector representation is that any speech utterance with a variable duration length can be represented as a fixed length vector. In modeling, a discriminative transform or classifier must be applied to emphasize the variations correlated to language identity since the i-vector representation encodes several types of the acoustic variations (e.g., speaker variation, transmission channel variation, etc.). Owing to the strong nonlinear discriminative power, the neural network model has been directly used to learn the mapping function between the i-vector representation and the language identity labels. In most studies, only the point-wise feature-label information is fed to the model for parameter learning that may result in model overfitting, particularly when with limited training data. In this study, we propose to integrate pair-wise distance metric learning as the regularization of model parameter optimization. In the representationHighlights: Pair-wise distance metric learning was designed on the feature transform layers. Stacking a soft-max layer on the feature transform layers as a classifier layer. Learning the coupled functions in feature extraction and classification layers with a regularized objective function. Experiments showed significant improvements compared to conventional regularization algorithms. Abstract: The i-vector representation and modeling technique has been successfully applied in spoken language identification (SLI). The advantage of using the i-vector representation is that any speech utterance with a variable duration length can be represented as a fixed length vector. In modeling, a discriminative transform or classifier must be applied to emphasize the variations correlated to language identity since the i-vector representation encodes several types of the acoustic variations (e.g., speaker variation, transmission channel variation, etc.). Owing to the strong nonlinear discriminative power, the neural network model has been directly used to learn the mapping function between the i-vector representation and the language identity labels. In most studies, only the point-wise feature-label information is fed to the model for parameter learning that may result in model overfitting, particularly when with limited training data. In this study, we propose to integrate pair-wise distance metric learning as the regularization of model parameter optimization. In the representation space of nonlinear transforms in the hidden layers, a distance metric learning is explicitly designed to minimize the pair-wise intra-class variation and maximize the inter-class variation. Using the pair-wise distance metric learning, the i-vectors are transformed to a new feature space, wherein they are much more discriminative for samples belonging to different languages while being much more similar for samples belonging to the same language. We tested the algorithm on an SLI task, and obtained promising results, which outperformed conventional regularization methods. … (more)
- Is Part Of:
- Computer speech & language. Volume 44(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 44(2017)
- Issue Display:
- Volume 44, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 44
- Issue:
- 2017
- Issue Sort Value:
- 2017-0044-2017-0000
- Page Start:
- 48
- Page End:
- 60
- Publication Date:
- 2017-07
- Subjects:
- Neural network model -- Cross entropy -- Pair-wise distance metric learning -- Spoken language identification,
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.01.006 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 371.xml