Investigating topics, audio representations and attention for multimodal scene-aware dialog. (November 2020)

Record Type:: Journal Article
Title:: Investigating topics, audio representations and attention for multimodal scene-aware dialog. (November 2020)
Main Title:: Investigating topics, audio representations and attention for multimodal scene-aware dialog
Authors:: Kumar, Shachi H.
Okur, Eda
Sahay, Saurav
Huang, Jonathan
Nachman, Lama
Abstract:: Highlights: Present various architectural extensions to DSTC7 AVSD track baseline. Topics can be used to represent context of a dialog - improves performance. Exploration of multiple attention mechanisms during response generation. End-to-end audio classification convnet called AclNet improves performance. Abstract: With the recent advancements in Artificial Intelligence(AI), Intelligent Virtual Assistants (IVA) such as Alexa and Google Home have become a ubiquitous part of every home. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. As part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, we explore three main techniques for multimodal dialog: 1) exploring 'topics' of the dialog as an important contextual feature for scene-aware conversations, 2) investigating several multimodal attention mechanisms during response generation and 3) incorporating an end-to-end audio classification sub network(AclNet) into our architecture. We present detailed analysis of our experiments and show that our model variations outperform the baseline system presented for this task.
Is Part Of:: Computer speech & language. Volume 64(2020)
Journal:: Computer speech & language
Issue:: Volume 64(2020)
Issue Display:: Volume 64, Issue 2020 (2020)
Year:: 2020
Volume:: 64
Issue:: 2020
Issue Sort Value:: 2020-0064-2020-0000
Page Start:
Page End:
Publication Date:: 2020-11
Subjects:: AI -- Intelligent assistants -- Multimodal understanding -- Response generation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2020.101102 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 13392.xml