Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts. (30th December 2021)
- Record Type:
- Journal Article
- Title:
- Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts. (30th December 2021)
- Main Title:
- Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts
- Authors:
- Guerreiro, Nuno Miguel
Rei, Ricardo
Batista, Fernando - Abstract:
- Abstract: This paper proposes a flexible approach for punctuation prediction that can be used to produce state-of-the-art results in a multilingual scenario. We have performed experiments using transcripts of TED Talks from the IWSLT 2017 and IWSLT 2011 evaluation campaigns. Our experiments show that the recognition errors of the ASR output degrade the performance of our models, in line with related literature. Our monolingual models perform consistently in Human-edited transcripts of German, Dutch, Portuguese and Romanian, suggesting that commas may be more difficult to predict than periods, using pre-trained contextual models. We have trained a single multilingual model that predicts punctuation in multiple languages that achieves results comparable with the ones achieved by monolingual models, revealing evidence of the potential of using a single multilingual model to solve the task for multiple languages. Then, we argue that usage of current punctuation systems in the literature are implicitly dependent on correct segmentation of ASR outputs for they rely on positional information to solve the punctuation task. This is too big of a requirement for use in a real life application. Through several experiments, we show that our method to train and test models is more robust to different segmentation. These contributions are of particular importance in our multilingual pipeline, since they avoid training a different model for each of the involved languages, and they guaranteeAbstract: This paper proposes a flexible approach for punctuation prediction that can be used to produce state-of-the-art results in a multilingual scenario. We have performed experiments using transcripts of TED Talks from the IWSLT 2017 and IWSLT 2011 evaluation campaigns. Our experiments show that the recognition errors of the ASR output degrade the performance of our models, in line with related literature. Our monolingual models perform consistently in Human-edited transcripts of German, Dutch, Portuguese and Romanian, suggesting that commas may be more difficult to predict than periods, using pre-trained contextual models. We have trained a single multilingual model that predicts punctuation in multiple languages that achieves results comparable with the ones achieved by monolingual models, revealing evidence of the potential of using a single multilingual model to solve the task for multiple languages. Then, we argue that usage of current punctuation systems in the literature are implicitly dependent on correct segmentation of ASR outputs for they rely on positional information to solve the punctuation task. This is too big of a requirement for use in a real life application. Through several experiments, we show that our method to train and test models is more robust to different segmentation. These contributions are of particular importance in our multilingual pipeline, since they avoid training a different model for each of the involved languages, and they guarantee that the model will be more robust to incorrect segmentation of the ASR outputs in comparison with other methods in the literature. To the best of our knowledge, we report the first experiments using a single multilingual model for punctuation restoration in multiple languages. Highlights: A state-of-the-art approach for multilingual punctuation prediction. Knowledge about punctuation from pre-trained transformer-based encoder models. Monolingual models tested both in human-edited and in automatic transcripts. Single multilingual model predicts punctuation in multiple languages. Integration within an existing multilingual video subtitling pipeline. … (more)
- Is Part Of:
- Expert systems with applications. Volume 186(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 186(2021)
- Issue Display:
- Volume 186, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 186
- Issue:
- 2021
- Issue Sort Value:
- 2021-0186-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12-30
- Subjects:
- Punctuation marks -- Intelligent subtitles -- Pre-trained embeddings -- Speech transcripts -- Sentence boundaries -- Multilingual embeddings
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.115740 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19627.xml