Supervised speech separation combined with adaptive beamforming. (November 2022)
- Record Type:
- Journal Article
- Title:
- Supervised speech separation combined with adaptive beamforming. (November 2022)
- Main Title:
- Supervised speech separation combined with adaptive beamforming
- Authors:
- Šarić, Zoran
Subotić, Miško
Bilibajkić, Ružica
Barjaktarović, Marko
Stojanović, Jasmina - Abstract:
- Highlights: Proposed method combines supervised speech separation and adaptive beamforming. Adaptive beamforming is conducted by simplified GSC beamformer. Supervised speech separation suppresses noise with ratio mask estimated by DNN. Advantages: auto-calibration, self-steering, adaptation control, post-filtering. Combined method outperforms the individual noise reduction methods. Abstract: Microphone arrays are a powerful tool for ambient noise suppression. A multi-channel minimum mean square error (MMSE) solution can be factorized into a minimum variance distortionless response beamformer (MVDR) followed by a single-channel Wiener post-filter. MVDR beamformer, as well as its equivalent form of generalized sidelobe canceller (GSC), often does not provide sufficient noise reduction due to its limited ability to reduce diffuse noise and reverberation. Steering and calibration errors also degrade the performance of both MVDR and GSC beamformers. Post-filter can be realized by any single-channel noise reduction method. A modern and promising approach for single-channel noise reduction is formulated as a supervised speech separation (SSS) in which a supervised learning algorithm, typically a deep neural network (DNN), is trained to learn a mapping from the noisy features to a time-frequency representation of the target of interest. In this paper, we combined SSS and adaptive beamforming approaches. Adaptive beamforming is realized by simplified GSC (S-GSC) whose equivalenceHighlights: Proposed method combines supervised speech separation and adaptive beamforming. Adaptive beamforming is conducted by simplified GSC beamformer. Supervised speech separation suppresses noise with ratio mask estimated by DNN. Advantages: auto-calibration, self-steering, adaptation control, post-filtering. Combined method outperforms the individual noise reduction methods. Abstract: Microphone arrays are a powerful tool for ambient noise suppression. A multi-channel minimum mean square error (MMSE) solution can be factorized into a minimum variance distortionless response beamformer (MVDR) followed by a single-channel Wiener post-filter. MVDR beamformer, as well as its equivalent form of generalized sidelobe canceller (GSC), often does not provide sufficient noise reduction due to its limited ability to reduce diffuse noise and reverberation. Steering and calibration errors also degrade the performance of both MVDR and GSC beamformers. Post-filter can be realized by any single-channel noise reduction method. A modern and promising approach for single-channel noise reduction is formulated as a supervised speech separation (SSS) in which a supervised learning algorithm, typically a deep neural network (DNN), is trained to learn a mapping from the noisy features to a time-frequency representation of the target of interest. In this paper, we combined SSS and adaptive beamforming approaches. Adaptive beamforming is realized by simplified GSC (S-GSC) whose equivalence with MVDR beamformer is also proved in the paper. In the proposed S-GSC beamformer, the conventional beamformer is replaced by the central microphone signal. Steering towards the target speaker needs no direction of arrival (DOA) estimation. Trained DNN of the SSS module estimates ideal ratio mask (IRM) which is used for adaptation of the blocking matrix, calibration of the microphones, adaptation for the adaptive noise canceller, and the post-filtering. The proposed method was tested on 720 utterances of the TIMIT database used as target speech. The reverberant room was simulated by acoustic impulse responses recorded in the real room. Performance analysis was carried out with PESQ, STOI, and SDR measures. The test results showed that the proposed combined method outperforms the individual SSS and S-GSC methods. Graphical abstract: Image, graphical abstract … (more)
- Is Part Of:
- Computer speech & language. Volume 76(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 76(2022)
- Issue Display:
- Volume 76, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 76
- Issue:
- 2022
- Issue Sort Value:
- 2022-0076-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-11
- Subjects:
- Supervised speech separation -- Deep learning -- Ambient noise suppression -- Adaptive beamforming -- LCMV beamformer -- GSC beamformer -- Post-filter
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101409 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21757.xml