Multimodal speech recognition for unmanned aerial vehicles. (March 2021)

Record Type:: Journal Article
Title:: Multimodal speech recognition for unmanned aerial vehicles. (March 2021)
Main Title:: Multimodal speech recognition for unmanned aerial vehicles
Authors:: Oneață, Dan
Cucu, Horia
Abstract:: Abstract: Unmanned aerial vehicles (UAVs) are becoming widespread with applications ranging from film-making and journalism to rescue operations and surveillance. Research communities (speech processing, computer vision, control) are starting to explore the limits of UAVs, but their efforts remain somewhat isolated. In this paper we unify multiple modalities (speech, vision, language) into a speech interface for UAV control. Our goal is to perform unconstrained speech recognition while leveraging the visual context. To this end, we introduce a multimodal evaluation dataset, consisting of spoken commands and associated images, which represent the visual context of what the UAV "sees" when the pilot utters the command. We provide baseline results and address two main research directions. First, we investigate the robustness of the system by (i) training it with a partial list of commands, and (ii) corrupting the recordings with outdoor noise. We perform a controlled set of experiments by varying the size of the training data and the signal-to-noise ratio. Second, we look at how to incorporate visual information into our model. We show that we can incorporate visual cues in the pipeline through the language model, which we implemented using a recurrent neural network. Moreover, by using gradient activation maps the system can provide visual feedback to the pilot regarding the UAV's understanding of the command. Our conclusions are that multimodal speech recognition can be … (more)
Is Part Of:: Computers & electrical engineering. Volume 90(2021)
Journal:: Computers & electrical engineering
Issue:: Volume 90(2021)
Issue Display:: Volume 90, Issue 2021 (2021)
Year:: 2021
Volume:: 90
Issue:: 2021
Issue Sort Value:: 2021-0090-2021-0000
Page Start:
Page End:
Publication Date:: 2021-03
Subjects:: Automatic speech recognition -- Multimodal learning -- Domain adaptation -- Unmanned aerial vehicles
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854
Journal URLs:: http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.compeleceng.2020.106943 ↗
Languages:: English
ISSNs:: 0045-7906
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 16719.xml