Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise. (18th March 2008)
- Record Type:
- Journal Article
- Title:
- Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise. (18th March 2008)
- Main Title:
- Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise
- Authors:
- Syed, Waheeduddin Q.
Wu, Hsiao-Chun - Other Names:
- Chazan Dan Academic Editor.
- Abstract:
- Abstract : The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold, which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple of existing representative schemes, namely, the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noises, namely, white noise (stationary), babble noise (nonstationary), and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating characteristics (ROC) curves. Finally, we demonstrate oneAbstract : The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold, which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple of existing representative schemes, namely, the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noises, namely, white noise (stationary), babble noise (nonstationary), and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating characteristics (ROC) curves. Finally, we demonstrate one of the major applications, namely, speech waveform compression associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency. … (more)
- Is Part Of:
- EURASIP journal on audio, speech, and music processing. Volume 2008(2008)
- Journal:
- EURASIP journal on audio, speech, and music processing
- Issue:
- Volume 2008(2008)
- Issue Display:
- Volume 2008, Issue 2008 (2008)
- Year:
- 2008
- Volume:
- 2008
- Issue:
- 2008
- Issue Sort Value:
- 2008-2008-2008-0000
- Page Start:
- Page End:
- Publication Date:
- 2008-03-18
- Subjects:
- Sound -- Recording and reproducing -- Digital techniques -- Periodicals
Computer sound processing -- Periodicals
Computer sound processing
Sound -- Recording and reproducing -- Digital techniques
Periodicals
Electronic journal
Electronic journals
620.2 - Journal URLs:
- https://asmp-eurasipjournals.springeropen.com/ ↗
http://www.hindawi.com/GetJournal.aspx?journal=ASMP ↗
http://www.hindawi.com/journals/asmp/contents.html ↗
http://link.springer.com/ ↗ - DOI:
- 10.1155/2008/639839 ↗
- Languages:
- English
- ISSNs:
- 1687-4714
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10486.xml