Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room. (3rd May 2021)
- Record Type:
- Journal Article
- Title:
- Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room. (3rd May 2021)
- Main Title:
- Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room
- Authors:
- Dat, Trinh Tan
Dang, Le Tran Anh
Sang, Vu Ngoc Thanh
Thuy, Le Nhi Lam
Bao, Pham The - Abstract:
- We introduce automatic Vietnamese speech recognition (ASR) system for converting Vietnamese speech to text on a real operating room ambient noise recorded during liver surgery. First, we propose applying a combination between convolutional neural network (CNN) and bidirectional long short-term memory (BLSTM) for investigating local speech feature learning, sequence modelling, and transcription for speech recognition. We also extend the CNN-LSTM framework with an attention mechanism to decode the frames into a sequence of words. The CNN, LSTM and attention models are combining into a unified architecture. In addition, we combine connectionist temporal classification (CTC) and attention's loss functions in training phase. The length of the output label sequence from CTC is applied to the attention-based decoder predictions to make the final label sequence. This process helps to decrease irregular alignments and make speedup of the label sequence estimation during training and inference, instead of only relying on the data-driven attention-based encoder-decoder for estimating the label sequence in long sentences. The proposed system is evaluated using a real operating room database. The results show that our method significantly enhances the performance of the ASR system. We find that our approach provides a 13.05% in WER and outperforms standard methods.
- Is Part Of:
- International journal of intelligent information and database systems. Volume 14:Number 3(2021)
- Journal:
- International journal of intelligent information and database systems
- Issue:
- Volume 14:Number 3(2021)
- Issue Display:
- Volume 14, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 14
- Issue:
- 3
- Issue Sort Value:
- 2021-0014-0003-0000
- Page Start:
- 294
- Page End:
- 314
- Publication Date:
- 2021-05-03
- Subjects:
- Vietnamese speech recognition -- convolutional neural network -- CNN -- bidirectional long short-term memory -- BLSTM -- attention -- operating room
Database management -- Computer programs -- Periodicals
Information retrieval -- Computer programs -- Periodicals
Information storage and retrieval systems -- Computer programs -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligent agents (Computer software) -- Periodicals
006.33 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijiids ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1751-5858
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15975.xml