Automatic quality estimation for ASR system combination. (January 2018)
- Record Type:
- Journal Article
- Title:
- Automatic quality estimation for ASR system combination. (January 2018)
- Main Title:
- Automatic quality estimation for ASR system combination
- Authors:
- Jalalvand, Shahab
Negri, Matteo
Falavigna, Daniele
Matassoni, Marco
Turchi, Marco - Abstract:
- Highlights: Review on automatic speech recognition quality estimation (ASR QE). The application of ASR QE in ASR system combination for both single-microphone multiple-ASR system task and on multiple-microphone multiple-ASR system task. Ranking the system combination inputs based on predicted quality. Management of tied ranks. Automatically finding the optimum level of combination for each segment. Abstract: Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to overestimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypothesesHighlights: Review on automatic speech recognition quality estimation (ASR QE). The application of ASR QE in ASR system combination for both single-microphone multiple-ASR system task and on multiple-microphone multiple-ASR system task. Ranking the system combination inputs based on predicted quality. Management of tied ranks. Automatically finding the optimum level of combination for each segment. Abstract: Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to overestimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that exploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the absolute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%. … (more)
- Is Part Of:
- Computer speech & language. Volume 47(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 47(2018)
- Issue Display:
- Volume 47, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 47
- Issue:
- 2018
- Issue Sort Value:
- 2018-0047-2018-0000
- Page Start:
- 214
- Page End:
- 239
- Publication Date:
- 2018-01
- Subjects:
- Automatic speech recognition -- Quality estimation -- System combination
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.06.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20832.xml