Aggregating discriminative embedding by triple-domain feature joint learning with bidirectional sampling for speaker verification. (May 2023)
- Record Type:
- Journal Article
- Title:
- Aggregating discriminative embedding by triple-domain feature joint learning with bidirectional sampling for speaker verification. (May 2023)
- Main Title:
- Aggregating discriminative embedding by triple-domain feature joint learning with bidirectional sampling for speaker verification
- Authors:
- Zi, Yunfei
Xiong, Shengwu - Abstract:
- Highlights: Improving speaker recognition performance. Proposing a triple-domain feature joint learning to aggregate the discriminative embedding for speaker verification. Proposing bidirectional sampling multi-scale network for capturing the different stages, different scale effective information. Design the Fisher feature fusion method, further enhance speaker individual information and reduce speaker commonality information. Designing the TribiNet model can achieve a robust model, high performance, and accuracy for speaker recognition. Abstract: Each axis of the speech, including time-domain, frequency-domain, and spectral-domain data, that's represents different physical meanings and different dimension information. Time-domain focus on physical signal versus time, the frequency-domain focus on the amount of signal in a given frequency band, and the spectral-domain focus on global power with speech, so only using the spectrogram to represent the whole information of speech to do speaker recognition will lose a lot of details information with the other dimensions. To tackle this limitation, we propose a triple-domain feature joint learning to enhance discriminative embedding from more dimensions for text-independent speaker verification. To further aggregate discriminative embedding, each domain uses a novel bidirectional sampling multi-scale feature aggregation network based on Fisher feature fusion to project spectrum features to more discriminative embeddings, termedHighlights: Improving speaker recognition performance. Proposing a triple-domain feature joint learning to aggregate the discriminative embedding for speaker verification. Proposing bidirectional sampling multi-scale network for capturing the different stages, different scale effective information. Design the Fisher feature fusion method, further enhance speaker individual information and reduce speaker commonality information. Designing the TribiNet model can achieve a robust model, high performance, and accuracy for speaker recognition. Abstract: Each axis of the speech, including time-domain, frequency-domain, and spectral-domain data, that's represents different physical meanings and different dimension information. Time-domain focus on physical signal versus time, the frequency-domain focus on the amount of signal in a given frequency band, and the spectral-domain focus on global power with speech, so only using the spectrogram to represent the whole information of speech to do speaker recognition will lose a lot of details information with the other dimensions. To tackle this limitation, we propose a triple-domain feature joint learning to enhance discriminative embedding from more dimensions for text-independent speaker verification. To further aggregate discriminative embedding, each domain uses a novel bidirectional sampling multi-scale feature aggregation network based on Fisher feature fusion to project spectrum features to more discriminative embeddings, termed TribiNet. Extensive experiments are conducted on a text-independent speaker verification dataset generated from the VoxCeleb corpus. The results demonstrate that the proposed method outperforms the state-of-the-art deep embedding architectures by at least 12%-58% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods. … (more)
- Is Part Of:
- Biomedical signal processing and control. Volume 83(2023)
- Journal:
- Biomedical signal processing and control
- Issue:
- Volume 83(2023)
- Issue Display:
- Volume 83, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 83
- Issue:
- 2023
- Issue Sort Value:
- 2023-0083-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-05
- Subjects:
- Discriminative embedding -- Triple-domain feature -- Joint learning -- Bidirectional sampling -- Speaker verification
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗ - DOI:
- 10.1016/j.bspc.2023.104703 ↗
- Languages:
- English
- ISSNs:
- 1746-8094
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26178.xml