Semi-supervised speech activity detection with an application to automatic speaker verification. (January 2018)

Record Type:: Journal Article
Title:: Semi-supervised speech activity detection with an application to automatic speaker verification. (January 2018)
Main Title:: Semi-supervised speech activity detection with an application to automatic speaker verification
Authors:: Sholokhov, Alexey
Sahidullah, Md
Kinnunen, Tomi
Abstract:: Highlights: We propose a new speech activity detector (SAD) based on semi-supervised learning of Gaussian mixture model (GMM). The proposed SAD requires lower amount of data labeled data for initialization as compared to GMM-based approach. We have shown improved detection of speech and non-speech frames on NIST OpenSAD dataset. The proposed SAD gives promising results compared to other SADs in robust speaker verification task. Abstract: We propose a simple speech activity detector (SAD) based on recording-specific Gaussian mixture modeling (GMM) of speech and non-speech frames. We extend the conventional expectation-maximization (EM) algorithm for GMM training using semi-supervised learning. It provides a methodology to incorporate unlabeled data into the SAD training process, leading to more accurate statistical models by exploiting the structure of data distribution. It fits naturally to off-line applications that may require partial human assistance, or applications that involve processing large quantities of audio data, such as text-independent speaker verification, speaker diarization or audio surveillance. The proposed SAD does not require any off-line training data as supervised SADs do. Rather, it employs initial labels produced from a tiny fraction of a given audio recording with the help of another simpler SAD (or a human operator). The proposed SAD is analyzed for the different covariance types, the initialization methods for speech and non-speech class, the … (more)
Is Part Of:: Computer speech & language. Volume 47(2018)
Journal:: Computer speech & language
Issue:: Volume 47(2018)
Issue Display:: Volume 47, Issue 2018 (2018)
Year:: 2018
Volume:: 47
Issue:: 2018
Issue Sort Value:: 2018-0047-2018-0000
Page Start:: 132
Page End:: 156
Publication Date:: 2018-01
Subjects:: Speech activity detection -- Semi-supervised learning -- Gaussian mixture model -- Speaker recognition -- NIST OpenSAD -- NIST SRE
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2017.07.005 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 20786.xml