Sequential routing framework: Fully capsule network-based speech recognition. (November 2021)
- Record Type:
- Journal Article
- Title:
- Sequential routing framework: Fully capsule network-based speech recognition. (November 2021)
- Main Title:
- Sequential routing framework: Fully capsule network-based speech recognition
- Authors:
- Lee, Kyungmin
Joe, Hyunwhan
Lim, Hyeontaek
Kim, Kwangyoun
Kim, Sungsoo
Han, Chang Woo
Kim, Hong-Gee - Abstract:
- Highlights: Capsule network only structures can successfully map sequences to sequences. Mappings are refined by initializing routing iteration based on the previous output. Sequence-wise routing iteration allows for non-iterative inference. Structures of capsule network are more important than the number of parameters. Top layer capsules become similar to the capsule corresponding to a sequence label. Abstract: Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-termHighlights: Capsule network only structures can successfully map sequences to sequences. Mappings are refined by initializing routing iteration based on the previous output. Sequence-wise routing iteration allows for non-iterative inference. Structures of capsule network are more important than the number of parameters. Top layer capsules become similar to the capsule corresponding to a sequence label. Abstract: Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-term memory-based CTC networks. On the TIMIT corpus, it attains a 0.7% lower phone error rate at 17.5% compared to convolutional neural network-based CTC networks (Zhang et al., 2016). … (more)
- Is Part Of:
- Computer speech & language. Volume 70(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 70(2021)
- Issue Display:
- Volume 70, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 70
- Issue:
- 2021
- Issue Sort Value:
- 2021-0070-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Capsule network -- Automatic speech recognition -- Sequence-to-sequence -- Connectionist temporal classification
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101228 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17320.xml