Channel and channel subband selection for speaker diarization. (September 2022)
- Record Type:
- Journal Article
- Title:
- Channel and channel subband selection for speaker diarization. (September 2022)
- Main Title:
- Channel and channel subband selection for speaker diarization
- Authors:
- Ahmed, Ahmed Isam
Chiverton, John P.
Ndzi, David L.
Al-Faris, Mahmoud M. - Abstract:
- Abstract: Speaker diarization can be considered to be one of the complex problems in speaker recognition. A reliable diarization system should be able to accurately determine the variable length utterances which a speaker contributes to multi-speaker conversations. This is a difficult problem since text-independent speaker identification and verification is yet to be improved for it to be applied reliably. While efficient speaker modelling is important for diarization, the acoustical representation of speech is the basic entity that signifies a speaker. This representation should be outstanding enough to prevent a speaker's utterances from being lost in the acoustical congestion that is imposed by the rest of the talkers. For this purpose, it is proposed here, for the case of multiple-microphone diarization, multiple speech signals are used in the acoustic feature extraction instead of combining the signals beforehand. The reason is to make an optimal use of those signals in order to enrich the quality of the acoustical representation of the speaker. To this end, and since not all microphone signals (channels) may be desirable, two selection approaches are proposed in this work. These are, a best quality channel selection method and a novel approach for diverse channel selection. Furthermore, a novel method is proposed which retains the speech spectrum from selected least reverberated subbands of the available channels' spectrums. A new model, referred to here as AveragedAbstract: Speaker diarization can be considered to be one of the complex problems in speaker recognition. A reliable diarization system should be able to accurately determine the variable length utterances which a speaker contributes to multi-speaker conversations. This is a difficult problem since text-independent speaker identification and verification is yet to be improved for it to be applied reliably. While efficient speaker modelling is important for diarization, the acoustical representation of speech is the basic entity that signifies a speaker. This representation should be outstanding enough to prevent a speaker's utterances from being lost in the acoustical congestion that is imposed by the rest of the talkers. For this purpose, it is proposed here, for the case of multiple-microphone diarization, multiple speech signals are used in the acoustic feature extraction instead of combining the signals beforehand. The reason is to make an optimal use of those signals in order to enrich the quality of the acoustical representation of the speaker. To this end, and since not all microphone signals (channels) may be desirable, two selection approaches are proposed in this work. These are, a best quality channel selection method and a novel approach for diverse channel selection. Furthermore, a novel method is proposed which retains the speech spectrum from selected least reverberated subbands of the available channels' spectrums. A new model, referred to here as Averaged Joint Gradient (AJG), is introduced for this purpose. The proposed approach reduces the Diarization Error Rate (DER) in both of the diarization systems used in the evaluations. The first system is based on binary keys and achieves a maximum relative reduction in DER of 14%. The second one is a Gaussian Mixture Model-Bayesian Information Criterion (GMM-BIC) based system which achieves a maximum relative reduction in DER of 20%. Highlights: New acoustical front-ends for multiple microphone speaker diarization are introduced. The front-ends include diverse and good quality channel selection. Subband channel selection is also introduced to mitigate subband reverberation. The proposed methods present improved performance compared to acoustical beamforming. … (more)
- Is Part Of:
- Computer speech & language. Volume 75(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 75(2022)
- Issue Display:
- Volume 75, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 75
- Issue:
- 2022
- Issue Sort Value:
- 2022-0075-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Speaker diarization -- Channel selection -- Reverberation -- Acoustic beamforming
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101367 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26862.xml