Parametric representation of excitation source information for language identification. (January 2017)
- Record Type:
- Journal Article
- Title:
- Parametric representation of excitation source information for language identification. (January 2017)
- Main Title:
- Parametric representation of excitation source information for language identification
- Authors:
- Nandi, Dipanjan
Pati, Debadatta
Rao, K. Sreenivasa - Abstract:
- Highlights: Excitation source information is explored for language identification. RMFCC and MPDSS features represent segmental level language-specific information. GFD parameters capture sub-segmental level language-specific information. PC and ESC represent supra-segmental level excitation source information. Complementary information from source and system features is examined. Abstract: In this work, the linear prediction (LP) residual signal has been parameterized to capture the excitation source information for language identification (LID) study. LP residual signal has been processed at three different levels: sub-segmental, segmental and supra-segmental levels to demonstrate different aspects of language-specific excitation source information. Proposed excitation source features have been evaluated on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC), Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) and National Institute of Standards and Technology Language Recognition Evaluation (NIST LRE) 2011 corpora. LID systems were developed using Gaussian mixture model (GMM) and i -vector based approaches. Experimental results have shown that segmental level parametric features provide better identification accuracy (62%), compared to sub-segmental (40%) and supra-segmental level (34%) features. Excitation source features obtained from three levels show distinctHighlights: Excitation source information is explored for language identification. RMFCC and MPDSS features represent segmental level language-specific information. GFD parameters capture sub-segmental level language-specific information. PC and ESC represent supra-segmental level excitation source information. Complementary information from source and system features is examined. Abstract: In this work, the linear prediction (LP) residual signal has been parameterized to capture the excitation source information for language identification (LID) study. LP residual signal has been processed at three different levels: sub-segmental, segmental and supra-segmental levels to demonstrate different aspects of language-specific excitation source information. Proposed excitation source features have been evaluated on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC), Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) and National Institute of Standards and Technology Language Recognition Evaluation (NIST LRE) 2011 corpora. LID systems were developed using Gaussian mixture model (GMM) and i -vector based approaches. Experimental results have shown that segmental level parametric features provide better identification accuracy (62%), compared to sub-segmental (40%) and supra-segmental level (34%) features. Excitation source features obtained from three levels show distinct language-specific evidence. Therefore, the scores from all three levels are combined to obtain the complete excitation source information for the LID task. LID performances achieved from both the excitation source and vocal tract system are compared. Finally, the scores obtained by processing the vocal tract and excitation source features are combined to achieve better improvement in LID accuracy. The best recognition accuracies obtained from stage-IV integrated LID systems I, II and III are 69%, 70% and 72% respectively. … (more)
- Is Part Of:
- Computer speech & language. Volume 41(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 41(2016)
- Issue Display:
- Volume 41, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 41
- Issue:
- 2016
- Issue Sort Value:
- 2016-0041-2016-0000
- Page Start:
- 88
- Page End:
- 115
- Publication Date:
- 2017-01
- Subjects:
- Excitation source information -- Language identification (LID) -- LP residual -- GFD parameters -- RMFCC -- MPDSS -- Pitch contour -- Epoch strength contour -- MFCC -- i-Vector -- IITKGP-MLILSC -- OGI-MLTS -- NIST LRE -- Cavg
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.05.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2481.xml