Signal-aware direction-of-arrival estimation using attention mechanisms. (September 2022)
- Record Type:
- Journal Article
- Title:
- Signal-aware direction-of-arrival estimation using attention mechanisms. (September 2022)
- Main Title:
- Signal-aware direction-of-arrival estimation using attention mechanisms
- Authors:
- Mack, Wolfgang
Wechsler, Julian
Habets, Emanuël A.P. - Abstract:
- Abstract: The direction-of-arrival (DOA) of sound sources is an essential acoustic parameter used, e.g., for multi-channel speech enhancement or source tracking. Complex acoustic scenarios consisting of sources-of-interest, interfering sources, reverberation, and noise make the estimation of the DOAs corresponding to the sources-of-interest a challenging task. Recently proposed attention mechanisms allow DOA estimators to focus on the sources-of-interest and disregard interference and noise, i.e., they are signal-aware. The attention is typically obtained by a deep neural network (DNN) from a short-time Fourier transform (STFT) based representation of a single microphone signal. Subsequently, attention has been applied as binary or ratio weighting to STFT-based microphone signal representations to reduce the impact of frequency bins dominated by noise, interference, or reverberation. The impact of attention on DOA estimators and different training strategies for attention and DOA DNNs are not yet studied in depth. In this paper, we evaluate systems consisting of different DNNs and signal processing-based methods for DOA estimation when attention is applied. Additionally, we propose training strategies for attention-based DOA estimation optimized via a DOA objective, i.e., end-to-end. The evaluation of the proposed and the baseline systems is performed using data generated with simulated and measured room impulse responses of a uniform-linear microphone array under variousAbstract: The direction-of-arrival (DOA) of sound sources is an essential acoustic parameter used, e.g., for multi-channel speech enhancement or source tracking. Complex acoustic scenarios consisting of sources-of-interest, interfering sources, reverberation, and noise make the estimation of the DOAs corresponding to the sources-of-interest a challenging task. Recently proposed attention mechanisms allow DOA estimators to focus on the sources-of-interest and disregard interference and noise, i.e., they are signal-aware. The attention is typically obtained by a deep neural network (DNN) from a short-time Fourier transform (STFT) based representation of a single microphone signal. Subsequently, attention has been applied as binary or ratio weighting to STFT-based microphone signal representations to reduce the impact of frequency bins dominated by noise, interference, or reverberation. The impact of attention on DOA estimators and different training strategies for attention and DOA DNNs are not yet studied in depth. In this paper, we evaluate systems consisting of different DNNs and signal processing-based methods for DOA estimation when attention is applied. Additionally, we propose training strategies for attention-based DOA estimation optimized via a DOA objective, i.e., end-to-end. The evaluation of the proposed and the baseline systems is performed using data generated with simulated and measured room impulse responses of a uniform-linear microphone array under various acoustic conditions, like reverberation times, noise, and source array distances. The data contains a single source-of-interest, noise, and directional interference. The best-performing systems are also evaluated using measured data. Our experiments show that DNNs used for DOA estimation are biased to the spectral source characteristics and the spectral attention distribution used during training (e.g., spectrally flat/sparse). We also show that this bias in the DOA estimator can be avoided if signal-processing methods are used in combination with attention. Overall, DOA estimation using attention in combination with signal-processing methods exhibits a far lower computational complexity than a fully DNN-based system; however, it yields comparable results. Highlights: Attention enables source-selective direction of arrival estimation. Combining data-driven and signal-processing methods reduces complexity. Attention can be estimated using masking concepts from source separation. … (more)
- Is Part Of:
- Computer speech & language. Volume 75(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 75(2022)
- Issue Display:
- Volume 75, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 75
- Issue:
- 2022
- Issue Sort Value:
- 2022-0075-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Direction-of-arrival -- Signal-dependent -- Attention -- Deep learning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101363 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21407.xml