A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions. (September 2020)
- Record Type:
- Journal Article
- Title:
- A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions. (September 2020)
- Main Title:
- A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions
- Authors:
- Tündik, Máté Ákos
Tarján, Balázs
Szaszák, György - Abstract:
- Highlights: Low latency, real-time automatic punctuation model. RNN-based punctuation outperforms the MaxEnt baseline. Subjective tests confirm that humans prefer punctuated captions. Deaf or hard of hearing users prefer automatic punctuation even more. Abstract: In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed on-line RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests areHighlights: Low latency, real-time automatic punctuation model. RNN-based punctuation outperforms the MaxEnt baseline. Subjective tests confirm that humans prefer punctuated captions. Deaf or hard of hearing users prefer automatic punctuation even more. Abstract: In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed on-line RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests are carried out on real broadcast data subtitled with ASR (closed captioning). As it can be expected, the RNN outperforms the MaxEnt baseline system, but of course not the off-line systems: limiting the future context to minimize latency results only in a slighter performance drop, but ASR errors obviously influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance showing that both recognition and punctuation of more spontaneous speech styles is challenging. Overall, the subjective tests confirmed that users perceive a significant quality improvement when punctuation is added, even in presence of word errors and even if punctuation is automatic and hence itself may contain further errors. For users living with hearing loss or deafness, an even higher, clear preference for the punctuated captions could be confirmed. … (more)
- Is Part Of:
- Computer speech & language. Volume 63(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 63(2020)
- Issue Display:
- Volume 63, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 63
- Issue:
- 2020
- Issue Sort Value:
- 2020-0063-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-09
- Subjects:
- Punctuation -- Recurrent neural network -- LSTM -- Maximum entropy -- Low latency -- Real-time modelling -- User-focused evaluation -- Mean opinion score -- Closed captioning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101076 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13576.xml