A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition. (April 2022)
- Record Type:
- Journal Article
- Title:
- A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition. (April 2022)
- Main Title:
- A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition
- Authors:
- Zhu, Wenbo
Jin, Hao
Chen, Jianwen
Luo, Lufeng
Wang, Jinhai
Lu, Qinghua
Li, Aiyuan - Abstract:
- Abstract: Nowadays, low-resource automatic speech recognition (ASR) is a challenging task. The traditional low-resource automatic speech recognition methods failed to capture pronunciation variations and did not have sufficient phone frame alignment capabilities. Some studies have found that pronunciation variations are mainly reflected in the distribution of resonance peaks for vowels and compound vowels and are particularly prominent in spectrograms. Inspired by this idea, we combine it with deep learning techniques and propose a hybrid acoustic model to address the difficulty of capturing pronunciation variation in low-resource ASR. We introduce a pronunciation difference processing (PDP) block to capture resonance peak variations. And we add an improved GRU network at the back end of the model to enhance the alignment of phone frame states. At the same time, we introduce a multi-head attention to combines coarse and fine-grained features of the audio and spectrum to highlights differences in resonant peaks. Finally, we analyzed the effect of different structure parameters and coding positions for the results. Our method was evaluated on the Aidatatang and IBAN datasets. Among them, the results show that adding the PDP module respectively reduces 1.84%, 0.26%WER and 5.2%, 4.3%SER as compared to the baseline mainstream model. After adding the improved GRU, the results show that adding the PDP module respectively reduces 1.92%, 0.38%WER and 5.6%, 4.4 %SER. At the same time,Abstract: Nowadays, low-resource automatic speech recognition (ASR) is a challenging task. The traditional low-resource automatic speech recognition methods failed to capture pronunciation variations and did not have sufficient phone frame alignment capabilities. Some studies have found that pronunciation variations are mainly reflected in the distribution of resonance peaks for vowels and compound vowels and are particularly prominent in spectrograms. Inspired by this idea, we combine it with deep learning techniques and propose a hybrid acoustic model to address the difficulty of capturing pronunciation variation in low-resource ASR. We introduce a pronunciation difference processing (PDP) block to capture resonance peak variations. And we add an improved GRU network at the back end of the model to enhance the alignment of phone frame states. At the same time, we introduce a multi-head attention to combines coarse and fine-grained features of the audio and spectrum to highlights differences in resonant peaks. Finally, we analyzed the effect of different structure parameters and coding positions for the results. Our method was evaluated on the Aidatatang and IBAN datasets. Among them, the results show that adding the PDP module respectively reduces 1.84%, 0.26%WER and 5.2%, 4.3%SER as compared to the baseline mainstream model. After adding the improved GRU, the results show that adding the PDP module respectively reduces 1.92%, 0.38%WER and 5.6%, 4.4 %SER. At the same time, after we introduced multi-head attention, the results show that adding the PDP module respectively reduces 2.33 %, 0.45%WER and 6.0%, 4.8 %SER. … (more)
- Is Part Of:
- Applied acoustics. Volume 192(2022)
- Journal:
- Applied acoustics
- Issue:
- Volume 192(2022)
- Issue Display:
- Volume 192, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 192
- Issue:
- 2022
- Issue Sort Value:
- 2022-0192-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04
- Subjects:
- Low resource -- Pronunciation differences -- Attention mechanism -- Contextual information
Acoustical engineering -- Periodicals
Periodicals
620.2 - Journal URLs:
- http://www.sciencedirect.com/science/journal/0003682X ↗
http://www.elsevier.com/journals ↗
http://www.elsevier.com/homepage/elecserv.htt ↗ - DOI:
- 10.1016/j.apacoust.2021.108601 ↗
- Languages:
- English
- ISSNs:
- 0003-682X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1571.400000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21270.xml