Localizing speakers in multiple rooms by using Deep Neural Networks. (May 2018)

Record Type:: Journal Article
Title:: Localizing speakers in multiple rooms by using Deep Neural Networks. (May 2018)
Main Title:: Localizing speakers in multiple rooms by using Deep Neural Networks
Authors:: Vesperini, Fabio
Vecchiotti, Paolo
Principi, Emanuele
Squartini, Stefano
Piazza, Francesco
Abstract:: Highlights: MLP and CNN architectures for multi-room speaker localization are investigated. Localization is performed by using the microphone signals coming from all rooms. An in-depth study on the effect of the temporal context is conducted. A reduced dependence on the microphones location inside the room is observed. The CNN approach with temporal context outperforms state-of-the-art algorithms on the DIRHA dataset. Abstract: In the field of human speech capturing systems, a fundamental role is played by the source localization algorithms. In this paper a Speaker Localization algorithm (SLOC) based on Deep Neural Networks (DNN) is evaluated and compared with state-of-the art approaches. The speaker position in the room under analysis is directly determined by the DNN, leading the proposed algorithm to be fully data-driven. Two different neural network architectures are investigated: the Multi Layer Perceptron (MLP) and Convolutional Neural Networks (CNN). GCC-PHAT (Generalized Cross Correlation-PHAse Transform) Patterns, computed from the audio signals captured by the microphone are used as input features for the DNN. In particular, a multi-room case study is dealt with, where the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested by means of the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In detail, the focus goes to speaker localization task in two distinct … (more)
Is Part Of:: Computer speech & language. Volume 49(2018)
Journal:: Computer speech & language
Issue:: Volume 49(2018)
Issue Display:: Volume 49, Issue 2018 (2018)
Year:: 2018
Volume:: 49
Issue:: 2018
Issue Sort Value:: 2018-0049-2018-0000
Page Start:: 83
Page End:: 106
Publication Date:: 2018-05
Subjects:: Acoustic source localization -- Speaker localization -- GCC-PHAT -- Deep Neural Networks -- Convolutional Neural Networks -- Computational Audio Processing
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2017.12.002 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 5619.xml