RVAD: An unsupervised segment-based robust voice activity detection method. (January 2020)

Record Type:: Journal Article
Title:: RVAD: An unsupervised segment-based robust voice activity detection method. (January 2020)
Main Title:: RVAD: An unsupervised segment-based robust voice activity detection method
Authors:: Tan, Zheng-Hua
Sarkar, Achintya kr.
Dehak, Najim
Abstract:: Highlights: Proposed an unsupervised segment-based method for robust voice activity detection. Proposed modified rVAD that uses computationally fast spectral flatness calculation. Evaluated rVAD in terms of VAD performance using RATS and Aurora-2 databases. Evaluated rVAD in terms of speaker verification performance using RedDots 2016. rVAD showed favorable performance on various difficult tasks over existing methods. Abstract: This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. We evaluate the VAD performance of the proposed method using … (more)
Is Part Of:: Computer speech & language. Volume 59(2020)
Journal:: Computer speech & language
Issue:: Volume 59(2020)
Issue Display:: Volume 59, Issue 2020 (2020)
Year:: 2020
Volume:: 59
Issue:: 2020
Issue Sort Value:: 2020-0059-2020-0000
Page Start:: 1
Page End:: 21
Publication Date:: 2020-01
Subjects:: a posteriori SNR -- Energy -- Pitch detection -- Spectral flatness -- Speech enhancement -- Voice activity detection -- Speaker verification
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2019.06.005 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 11888.xml