Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization. (19th June 2008)
- Record Type:
- Journal Article
- Title:
- Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization. (19th June 2008)
- Main Title:
- Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
- Authors:
- Yapanel, Umit H.
Hansen, John H. L. - Other Names:
- Kuo Sen M. Academic Editor.
- Abstract:
- Abstract : A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization . More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to theAbstract : A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization . More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to the baseline speaker normalization method. … (more)
- Is Part Of:
- EURASIP journal on audio, speech, and music processing. Volume 2008(2008)
- Journal:
- EURASIP journal on audio, speech, and music processing
- Issue:
- Volume 2008(2008)
- Issue Display:
- Volume 2008, Issue 2008 (2008)
- Year:
- 2008
- Volume:
- 2008
- Issue:
- 2008
- Issue Sort Value:
- 2008-2008-2008-0000
- Page Start:
- Page End:
- Publication Date:
- 2008-06-19
- Subjects:
- Sound -- Recording and reproducing -- Digital techniques -- Periodicals
Computer sound processing -- Periodicals
Computer sound processing
Sound -- Recording and reproducing -- Digital techniques
Periodicals
Electronic journal
Electronic journals
620.2 - Journal URLs:
- https://asmp-eurasipjournals.springeropen.com/ ↗
http://www.hindawi.com/GetJournal.aspx?journal=ASMP ↗
http://www.hindawi.com/journals/asmp/contents.html ↗
http://link.springer.com/ ↗ - DOI:
- 10.1155/2008/148967 ↗
- Languages:
- English
- ISSNs:
- 1687-4714
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10486.xml