Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. (November 2017)
- Record Type:
- Journal Article
- Title:
- Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments. (November 2017)
- Main Title:
- Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments
- Authors:
- Barfuss, Hendrik
Huemmer, Christian
Schwarz, Andreas
Kellermann, Walter - Abstract:
- Highlights: Proposal of two versions of a coherence-based postfilter which can be realized very efficiently, since only the spatial coherence between the microphone signals needs to be estimated. Both postfilter realizations significantly reduce the word error rates of the state-of-the art ChiME-3 baseline speech recognition system. Reductions of the word error rates are obtained for almost all evaluated scenarios which shows that our proposed coherence-based speech enhancement is very effective and works robustly in adverse real-world environments. Abstract: Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter's impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. ToHighlights: Proposal of two versions of a coherence-based postfilter which can be realized very efficiently, since only the spatial coherence between the microphone signals needs to be estimated. Both postfilter realizations significantly reduce the word error rates of the state-of-the art ChiME-3 baseline speech recognition system. Reductions of the word error rates are obtained for almost all evaluated scenarios which shows that our proposed coherence-based speech enhancement is very effective and works robustly in adverse real-world environments. Abstract: Speech recognition in adverse real-world environments is highly affected by reverberation and non-stationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech enhancement system is extended by a coherence-based postfilter and the postfilter's impact on the Word Error Rates (WERs) of a state-of-the-art automatic speech recognition system is investigated for the realistic noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use Direction-of-Arrival (DOA)-dependent and (DOA)-independent estimators of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech enhancement system leads to a significant reduction of the WERs, with relative improvements of up to 11.31%. … (more)
- Is Part Of:
- Computer speech & language. Volume 46(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 46(2017)
- Issue Display:
- Volume 46, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 46
- Issue:
- 2017
- Issue Sort Value:
- 2017-0046-2017-0000
- Page Start:
- 388
- Page End:
- 400
- Publication Date:
- 2017-11
- Subjects:
- Robust speech recognition -- Postfiltering -- Spectral enhancement -- Coherence-to-diffuse power ratio -- Wiener filter
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.02.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4753.xml