Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining. Issue 2 (March 2023)
- Record Type:
- Journal Article
- Title:
- Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining. Issue 2 (March 2023)
- Main Title:
- Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining
- Authors:
- Nowakowski, Karol
Ptaszynski, Michal
Murasaki, Kyoko
Nieuważny, Jagna - Abstract:
- Abstract: In recent years, neural models learned through self-supervised pretraining on large scale multilingual text or speech data have exhibited promising results for underresourced languages, especially when a relatively large amount of data from related language(s) is available. While the technology has a potential for facilitating tasks carried out in language documentation projects, such as speech transcription, pretraining a multilingual model from scratch for every new language would be highly impractical. We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language, focusing on actual fieldwork data from a critically endangered tongue: Ainu. Specifically, we (i) examine the feasibility of leveraging data from similar languages also in fine-tuning; (ii) verify whether the model's performance can be improved by further pretraining on target language data. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language and leads to considerable reduction in error rates. Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language. Highlights: Downstream performance of a multilingual speechAbstract: In recent years, neural models learned through self-supervised pretraining on large scale multilingual text or speech data have exhibited promising results for underresourced languages, especially when a relatively large amount of data from related language(s) is available. While the technology has a potential for facilitating tasks carried out in language documentation projects, such as speech transcription, pretraining a multilingual model from scratch for every new language would be highly impractical. We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language, focusing on actual fieldwork data from a critically endangered tongue: Ainu. Specifically, we (i) examine the feasibility of leveraging data from similar languages also in fine-tuning; (ii) verify whether the model's performance can be improved by further pretraining on target language data. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language and leads to considerable reduction in error rates. Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language. Highlights: Downstream performance of a multilingual speech representation model on a new, underresourced language can be improved through multilingual fine-tuning and additional pretraining. Continued pretraining on target language data leads to substantially lower error rates in automatic speech transcription. Multilingual fine-tuning with additional data from a related or similar language helps when labeled target language data is scarce. … (more)
- Is Part Of:
- Information processing & management. Volume 60:Issue 2(2023)
- Journal:
- Information processing & management
- Issue:
- Volume 60:Issue 2(2023)
- Issue Display:
- Volume 60, Issue 2 (2023)
- Year:
- 2023
- Volume:
- 60
- Issue:
- 2
- Issue Sort Value:
- 2023-0060-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- Automatic speech transcription -- ASR -- Wav2vec 2.0 -- Pretrained transformer models -- Speech representation models -- Cross-lingual transfer -- Language documentation -- Endangered languages -- Underresourced languages -- Sakhalin Ainu
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2022.103148 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25648.xml