Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios. (January 2021)
- Record Type:
- Journal Article
- Title:
- Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios. (January 2021)
- Main Title:
- Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios
- Authors:
- Díaz, Alejandro
Mahu, Rodrigo
Novoa, Jose
Wuth, Jorge
Datta, Jayanta
Yoma, Nestor Becerra - Abstract:
- Highlights: The effect of visual servoing in the performance of a linear microphone array regarding distant ASR is assessed in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios. A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources. This paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity gain on look direction. The average reduction in WER achieved when the robot head was steered toward the target speech source was as high as 28.2%. Abstract: Social robotics is becoming a reality and voice-based human-robot interaction is essential for a successful human-robot collaborative symbiosis. The main objective of this paper is to assess the effect of visual servoing in the performance of a linear microphone array regarding distant ASR in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios. Visual servoing and image target tracking are different tasks, and this paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity on look direction. The datasets required to carry out the study reported here do not exist and had to be generated. A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources. A linear microphone array was chosen as a case of study and its response was measured.Highlights: The effect of visual servoing in the performance of a linear microphone array regarding distant ASR is assessed in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios. A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources. This paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity gain on look direction. The average reduction in WER achieved when the robot head was steered toward the target speech source was as high as 28.2%. Abstract: Social robotics is becoming a reality and voice-based human-robot interaction is essential for a successful human-robot collaborative symbiosis. The main objective of this paper is to assess the effect of visual servoing in the performance of a linear microphone array regarding distant ASR in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios. Visual servoing and image target tracking are different tasks, and this paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity on look direction. The datasets required to carry out the study reported here do not exist and had to be generated. A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources. A linear microphone array was chosen as a case of study and its response was measured. Standard beamforming methods were evaluated with respect to visual servoing: delay-and-sum combined with image tracking; weighted delay-and-sum; and, MVDR also combined with image tracking. The results presented here show that the performance of beamforming methods is dramatically degraded in moving and non-stationary conditions. In this context, visual servoing in HRI can significantly improve the performance of a linear microphone array regarding ASR accuracy. The average reduction in WER achieved when the robot head was steered toward the target speech source was as high as 28.2%. Finally, it is worth highlighting that the methodology adopted here is applicable to any microphone array, linear or not. … (more)
- Is Part Of:
- Computer speech & language. Volume 65(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 65(2021)
- Issue Display:
- Volume 65, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 65
- Issue:
- 2021
- Issue Sort Value:
- 2021-0065-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-01
- Subjects:
- Human-robot interaction -- Visual servoing -- Beamforming -- Automatic speech recognition -- Source tracking
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101136 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16859.xml