Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts. (30th December 2021)

Record Type:: Journal Article
Title:: Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts. (30th December 2021)
Main Title:: Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts
Authors:: Guerreiro, Nuno Miguel
Rei, Ricardo
Batista, Fernando
Abstract:: Abstract: This paper proposes a flexible approach for punctuation prediction that can be used to produce state-of-the-art results in a multilingual scenario. We have performed experiments using transcripts of TED Talks from the IWSLT 2017 and IWSLT 2011 evaluation campaigns. Our experiments show that the recognition errors of the ASR output degrade the performance of our models, in line with related literature. Our monolingual models perform consistently in Human-edited transcripts of German, Dutch, Portuguese and Romanian, suggesting that commas may be more difficult to predict than periods, using pre-trained contextual models. We have trained a single multilingual model that predicts punctuation in multiple languages that achieves results comparable with the ones achieved by monolingual models, revealing evidence of the potential of using a single multilingual model to solve the task for multiple languages. Then, we argue that usage of current punctuation systems in the literature are implicitly dependent on correct segmentation of ASR outputs for they rely on positional information to solve the punctuation task. This is too big of a requirement for use in a real life application. Through several experiments, we show that our method to train and test models is more robust to different segmentation. These contributions are of particular importance in our multilingual pipeline, since they avoid training a different model for each of the involved languages, and they guarantee … (more)
Is Part Of:: Expert systems with applications. Volume 186(2021)
Journal:: Expert systems with applications
Issue:: Volume 186(2021)
Issue Display:: Volume 186, Issue 2021 (2021)
Year:: 2021
Volume:: 186
Issue:: 2021
Issue Sort Value:: 2021-0186-2021-0000
Page Start:
Page End:
Publication Date:: 2021-12-30
Subjects:: Punctuation marks -- Intelligent subtitles -- Pre-trained embeddings -- Speech transcripts -- Sentence boundaries -- Multilingual embeddings
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33
Journal URLs:: http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.eswa.2021.115740 ↗
Languages:: English
ISSNs:: 0957-4174
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 19627.xml