The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes. (November 2017)
- Record Type:
- Journal Article
- Title:
- The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes. (November 2017)
- Main Title:
- The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes
- Authors:
- Barker, Jon
Marxer, Ricard
Vincent, Emmanuel
Watanabe, Shinji - Abstract:
- Highlights: The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments. A detailed characterisation of the challenge audio using novel analyses to estimate key properties of the speakers, environments and noisy speech signals. An overview of 26 systems submitted to the challenge presenting a snapshot of the state-of-the-art in distant microphone ASR. A presentation of system performance identifying which signal processing and statistical modelling techniques are the most beneficial. A presentation of correlations between signal characteristics and system performances across utterances addressing the question, "What are the particular circumstances that lead to high word error rates?" Abstract: This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers theHighlights: The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments. A detailed characterisation of the challenge audio using novel analyses to estimate key properties of the speakers, environments and noisy speech signals. An overview of 26 systems submitted to the challenge presenting a snapshot of the state-of-the-art in distant microphone ASR. A presentation of system performance identifying which signal processing and statistical modelling techniques are the most beneficial. A presentation of correlations between signal characteristics and system performances across utterances addressing the question, "What are the particular circumstances that lead to high word error rates?" Abstract: This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations. … (more)
- Is Part Of:
- Computer speech & language. Volume 46(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 46(2017)
- Issue Display:
- Volume 46, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 46
- Issue:
- 2017
- Issue Sort Value:
- 2017-0046-2017-0000
- Page Start:
- 605
- Page End:
- 626
- Publication Date:
- 2017-11
- Subjects:
- Noise-robust ASR -- Microphone array -- 'CHiME' challenge
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.10.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2908.xml