Investigating topics, audio representations and attention for multimodal scene-aware dialog. (November 2020)
- Record Type:
- Journal Article
- Title:
- Investigating topics, audio representations and attention for multimodal scene-aware dialog. (November 2020)
- Main Title:
- Investigating topics, audio representations and attention for multimodal scene-aware dialog
- Authors:
- Kumar, Shachi H.
Okur, Eda
Sahay, Saurav
Huang, Jonathan
Nachman, Lama - Abstract:
- Highlights: Present various architectural extensions to DSTC7 AVSD track baseline. Topics can be used to represent context of a dialog - improves performance. Exploration of multiple attention mechanisms during response generation. End-to-end audio classification convnet called AclNet improves performance. Abstract: With the recent advancements in Artificial Intelligence(AI), Intelligent Virtual Assistants (IVA) such as Alexa and Google Home have become a ubiquitous part of every home. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. As part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, we explore three main techniques for multimodal dialog: 1) exploring 'topics' of the dialog as an important contextual feature for scene-aware conversations, 2) investigating several multimodal attention mechanisms during response generation and 3) incorporating an end-to-end audio classification sub network(AclNet) into our architecture. We present detailed analysis of our experiments and show that our model variations outperform the baseline system presented for this task.
- Is Part Of:
- Computer speech & language. Volume 64(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 64(2020)
- Issue Display:
- Volume 64, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 64
- Issue:
- 2020
- Issue Sort Value:
- 2020-0064-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- AI -- Intelligent assistants -- Multimodal understanding -- Response generation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101102 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13392.xml