Unsupervised and supervised VAD systems using combination of time and frequency domain features. (August 2020)
- Record Type:
- Journal Article
- Title:
- Unsupervised and supervised VAD systems using combination of time and frequency domain features. (August 2020)
- Main Title:
- Unsupervised and supervised VAD systems using combination of time and frequency domain features
- Authors:
- Korkmaz, Yunus
Boyacı, Aytuğ - Abstract:
- Highlights: A novel approach for supervised VAD systems (Combining ACF based pitch calculation with KNN classifier). A novel approach for unsupervised VAD (1st of MFCC, Sum of MFCC, Sum of WPT Shannon Entropies as feature set, squaring, normalization, thresholding and median filtering as processing step, and OR logic operation as decision step). To improve VAD system in silent environment (by supervised VAD) and in environment (e.g. Telecommunication channels/records) having noise similar to AWGN (by unsupervised VAD). Abstract: Voice Activity Detection (VAD), also referred as Speech Activity Detection (SAD) is the process of identifying speech/non-speech region in digital speech recordings. It is used as a preliminary stage to reduce errors and increase effectiveness in the most of speech based applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, two independent VAD structures were proposed for unsupervised and supervised approaches using both time and frequency domain features. The autocorrelation based pitch contour estimation was used together with the 1NN Cosine classifier trained by 21-column feature matrix comprising Energy, Zero Crossing Rate (ZCR), 13rd order-Mel Frequency Cepstral Coefficients (MFCC) and Shannon Entropies of daubechies-filtered 5th depth-Wavelet Packet Transform (WPT) to obtain VAD decision in supervised approach, while methods like normalization,Highlights: A novel approach for supervised VAD systems (Combining ACF based pitch calculation with KNN classifier). A novel approach for unsupervised VAD (1st of MFCC, Sum of MFCC, Sum of WPT Shannon Entropies as feature set, squaring, normalization, thresholding and median filtering as processing step, and OR logic operation as decision step). To improve VAD system in silent environment (by supervised VAD) and in environment (e.g. Telecommunication channels/records) having noise similar to AWGN (by unsupervised VAD). Abstract: Voice Activity Detection (VAD), also referred as Speech Activity Detection (SAD) is the process of identifying speech/non-speech region in digital speech recordings. It is used as a preliminary stage to reduce errors and increase effectiveness in the most of speech based applications like automatic speech recognition (ASR), speaker identification/verification, speech enhancement, speaker diarization etc. In this study, two independent VAD structures were proposed for unsupervised and supervised approaches using both time and frequency domain features. The autocorrelation based pitch contour estimation was used together with the 1NN Cosine classifier trained by 21-column feature matrix comprising Energy, Zero Crossing Rate (ZCR), 13rd order-Mel Frequency Cepstral Coefficients (MFCC) and Shannon Entropies of daubechies-filtered 5th depth-Wavelet Packet Transform (WPT) to obtain VAD decision in supervised approach, while methods like normalization, thresholding and median filtering were applied over the same feature set in unsupervised approach. The proposed unsupervised VAD achieved error rates of 4%, 19%, 0.02% and 0.7% for the FEC, MSC, OVER and NDS, respectively at 0 dB SNR. The VAD decisions of both supervised and unsupervised systems showed that the proposed methods can efficiently be used either in silent or in environments with noise similar to Additive White Gaussian Noise (AWGN). … (more)
- Is Part Of:
- Biomedical signal processing and control. Volume 61(2020)
- Journal:
- Biomedical signal processing and control
- Issue:
- Volume 61(2020)
- Issue Display:
- Volume 61, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 61
- Issue:
- 2020
- Issue Sort Value:
- 2020-0061-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-08
- Subjects:
- Voice activity detection -- ZCR -- WPT -- MFCC -- ACF based pitch -- KNN classification
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗ - DOI:
- 10.1016/j.bspc.2020.102044 ↗
- Languages:
- English
- ISSNs:
- 1746-8094
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23456.xml