Overlapped Speech Detection and speaker counting using distant microphone arrays. (March 2022)

Record Type:: Journal Article
Title:: Overlapped Speech Detection and speaker counting using distant microphone arrays. (March 2022)
Main Title:: Overlapped Speech Detection and speaker counting using distant microphone arrays
Authors:: Cornell, Samuele
Omologo, Maurizio
Squartini, Stefano
Vincent, Emmanuel
Abstract:: Abstract: We study the problem of detecting and counting simultaneous, overlapping speakers in a multichannel, distant-microphone scenario. Focusing on a supervised learning approach, we treat Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD and OSD (VAD+OSD) and speaker counting in a unified way, as instances of a general Overlapped Speech Detection and Counting (OSDC) multi-class supervised learning problem. We consider a Temporal Convolutional Network (TCN) and a Transformer based architecture for this task, and compare them with previously proposed state-of-the art methods based on Recurrent Neural Networks (RNN) or hybrid Convolutional-Recurrent Neural Networks (CRNN). In addition, we propose ways of exploiting multichannel input by means of early or late fusion of single-channel features with spatial features extracted from one or more microphone pairs. We conduct an extensive experimental evaluation on the AMI and CHiME-6 datasets and on a purposely made multichannel synthetic dataset. We show that the Transformer-based architecture performs best among all architectures and that neural network based spatial localization features outperform signal-based spatial features and significantly improve performance compared to single-channel features only. Finally, we find that training with a speaker counting objective improves OSD compared to training with a VAD+OSD objective. Highlights: We study the problem of detecting multiple speakers in a … (more)
Is Part Of:: Computer speech & language. Volume 72(2022)
Journal:: Computer speech & language
Issue:: Volume 72(2022)
Issue Display:: Volume 72, Issue 2022 (2022)
Year:: 2022
Volume:: 72
Issue:: 2022
Issue Sort Value:: 2022-0072-2022-0000
Page Start:
Page End:
Publication Date:: 2022-03
Subjects:: Voice activity detection -- Overlapped Speech Detection -- Speaker counting -- Distant microphones -- Spatial features
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2021.101306 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 20100.xml