Comparing human and automatic speech recognition in simple and complex acoustic scenes. (November 2018)

Record Type:: Journal Article
Title:: Comparing human and automatic speech recognition in simple and complex acoustic scenes. (November 2018)
Main Title:: Comparing human and automatic speech recognition in simple and complex acoustic scenes
Authors:: Spille, Constantin
Kollmeier, Birger
Meyer, Bernd T.
Abstract:: Highlights: Automatic speech recognition and human listeners are compared in single-channel and spatial scenes. In single-channel scenes, ASR is on a par with normal-hearing listeners. In spatial scenes, there is a substantial human-machine gap of 12.3 dB. 5.3 dB of this gap can be attributed to poor localization and missing speaker-related features. Abstract: Former comparisons of human speech recognition (HSR) and automatic speech recognition (ASR) have shown that humans outperform ASR systems in nearly all speech recognition tasks. However, recent progress in ASR has led to substantial improvements of recognition accuracy, and it is therefore unclear how large the task-dependent human-machine gap still remains. This paper investigates this gap between HSR and ASR based on deep neural networks (DNNs) in different acoustic conditions, with the aim of comparing differences and identifying processing strategies that should be considered in ASR. We find that DNN-based ASR reaches human performance for single-channel, small-vocabulary tasks in the presence of speech-shaped noise and in multi-talker babble noise, which is an important difference to previous human-machine comparisons: The speech reception threshold, i.e., the signal-to-noise ratio with 50% word recognition rate is at about −7 to −8 dB both for HSR and ASR. However, in more complex spatial scenes with diffuse noise and moving talkers, the SRT gap amounts to approximately 12 dB. Based on cross comparisons that use … (more)
Is Part Of:: Computer speech & language. Volume 52(2018)
Journal:: Computer speech & language
Issue:: Volume 52(2018)
Issue Display:: Volume 52, Issue 2018 (2018)
Year:: 2018
Volume:: 52
Issue:: 2018
Issue Sort Value:: 2018-0052-2018-0000
Page Start:: 123
Page End:: 140
Publication Date:: 2018-11
Subjects:: Human-machine comparison -- Speech recognition threshold -- Deep neural networks -- Speech intelligibility prediction -- Spatial scenes
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2018.04.003 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 17055.xml