A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions. (September 2020)

Record Type:: Journal Article
Title:: A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions. (September 2020)
Main Title:: A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions
Authors:: Tündik, Máté Ákos
Tarján, Balázs
Szaszák, György
Abstract:: Highlights: Low latency, real-time automatic punctuation model. RNN-based punctuation outperforms the MaxEnt baseline. Subjective tests confirm that humans prefer punctuated captions. Deaf or hard of hearing users prefer automatic punctuation even more. Abstract: In Automatic Speech Recognition (ASR), inserting the punctuation marks into the word chain hypothesis has long been given low priority, as efforts were concentrated on minimizing word error rates. Punctuation, however, also has a high impact on the transcription quality perceived by the users. Prosody, textual context and their combination have since been used successfully for automatic punctuation of ASR outputs. The recently proposed RNN based solutions show encouraging performance. We believe that current bottlenecks of punctuation technology are on one hand the complex punctuation models, which, having high latency, are not suitable for use-cases with real-time requirements; and on the other hand, punctuation efforts have not been validated against human perception and user impression. The ambition of this paper is to propose a lightweight, yet powerful RNN punctuation model for on-line (real-time including low latency) environment, and also to assess user opinion, in general and also for target users living with hearing loss or impairment. The proposed on-line RNN punctuation model is evaluated against a Maximum Entropy (MaxEnt) baseline, for Hungarian and for English, whereas subjective assessment tests are … (more)
Is Part Of:: Computer speech & language. Volume 63(2020)
Journal:: Computer speech & language
Issue:: Volume 63(2020)
Issue Display:: Volume 63, Issue 2020 (2020)
Year:: 2020
Volume:: 63
Issue:: 2020
Issue Sort Value:: 2020-0063-2020-0000
Page Start:
Page End:
Publication Date:: 2020-09
Subjects:: Punctuation -- Recurrent neural network -- LSTM -- Maximum entropy -- Low latency -- Real-time modelling -- User-focused evaluation -- Mean opinion score -- Closed captioning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2020.101076 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 13576.xml