A framework for speaker retrieval and identification through unsupervised learning. (November 2019)
- Record Type:
- Journal Article
- Title:
- A framework for speaker retrieval and identification through unsupervised learning. (November 2019)
- Main Title:
- A framework for speaker retrieval and identification through unsupervised learning
- Authors:
- Campos, Victor de Abreu
Pedronette, Daniel Carlos Guimarães - Abstract:
- Highlights: A speaker recognition approach based on unsupervised learning for retrieval and identification tasks. The unsupervised learning algorithms employ a rank-based formulation, which can be applied to different features and modeling techniques. Validation of the framework through MFCC and PLP features; VQ and GMM modeling; and RL-Sim and ReckNN unsupervised algorithms. An experimental evaluation conducted on 3 public datasets, considering different speaker recognition tasks. Effectiveness gains up to +56% on retrieval measures obtained by the unsupervised learning approach. Abstract: Speaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentallyHighlights: A speaker recognition approach based on unsupervised learning for retrieval and identification tasks. The unsupervised learning algorithms employ a rank-based formulation, which can be applied to different features and modeling techniques. Validation of the framework through MFCC and PLP features; VQ and GMM modeling; and RL-Sim and ReckNN unsupervised algorithms. An experimental evaluation conducted on 3 public datasets, considering different speaker recognition tasks. Effectiveness gains up to +56% on retrieval measures obtained by the unsupervised learning approach. Abstract: Speaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentally evaluated on query-by-example speaker retrieval and speaker identification tasks, both on clean and noisy speech recordings. An experimental evaluation was conducted on three public datasets, different languages, and recordings conditions. Effectiveness gains up to +56% on retrieval measures were obtained through the use of unsupervised learning algorithms over traditional speaker recognition techniques. … (more)
- Is Part Of:
- Computer speech & language. Volume 58(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 58(2019)
- Issue Display:
- Volume 58, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 58
- Issue:
- 2019
- Issue Sort Value:
- 2019-0058-2019-0000
- Page Start:
- 153
- Page End:
- 174
- Publication Date:
- 2019-11
- Subjects:
- Speaker recognition -- Speaker retrieval -- Unsupervised learning -- Vector quantization -- Gaussian mixture model -- i-vector
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.04.004 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11148.xml