A neural network approach for speech activity detection for Apollo corpus. (January 2021)
- Record Type:
- Journal Article
- Title:
- A neural network approach for speech activity detection for Apollo corpus. (January 2021)
- Main Title:
- A neural network approach for speech activity detection for Apollo corpus
- Authors:
- Pannala, Vishala
Yegnanarayana, B. - Abstract:
- Highlights: The speech activity detection task for Apollo corpus is addressed here, as the speech data is degraded in an unpredictable manner due to the naturalistic conversation in the space mission. An artificial neural network (ANN) model is proposed to capture the discriminating features of speech and degradations. The instantaneous spectral envelopes derived from the single frequency filtering (SFF) analysis of signals are used as input to the ANN model. Even a small amount (10 minutes) of data of speech and noise classes is adequate to train the ANN model. Post-processing of the ANN output is used to meet the evaluation guidelines listed in the Apollo challenge. Performance for 'dev" and 'eval' datasets are DCF values of 3.1% and 4.6% respectively, in comparison with 8.6% and 11.7% for the baseline system. Abstract: This paper describes a new method for speech activity detection (SAD) based on the recently proposed single frequency filtering (SFF) analysis of speech signals and a neural network model. The SFF analysis gives instantaneous spectrum of the speech signal at each sampling instant. The frequency resolution of the spectrum is decided by the number of frequencies used in the SFF analysis, which in turn depends on the frequency spacing. Using a frequency spacing of 10 Hz and a sampling frequency of 8 kHz, a 401 dimensional spectrum, covering 0–4 kHz, is obtained at each sampling instant. This is used as a feature vector to train an artificial neural networkHighlights: The speech activity detection task for Apollo corpus is addressed here, as the speech data is degraded in an unpredictable manner due to the naturalistic conversation in the space mission. An artificial neural network (ANN) model is proposed to capture the discriminating features of speech and degradations. The instantaneous spectral envelopes derived from the single frequency filtering (SFF) analysis of signals are used as input to the ANN model. Even a small amount (10 minutes) of data of speech and noise classes is adequate to train the ANN model. Post-processing of the ANN output is used to meet the evaluation guidelines listed in the Apollo challenge. Performance for 'dev" and 'eval' datasets are DCF values of 3.1% and 4.6% respectively, in comparison with 8.6% and 11.7% for the baseline system. Abstract: This paper describes a new method for speech activity detection (SAD) based on the recently proposed single frequency filtering (SFF) analysis of speech signals and a neural network model. The SFF analysis gives instantaneous spectrum of the speech signal at each sampling instant. The frequency resolution of the spectrum is decided by the number of frequencies used in the SFF analysis, which in turn depends on the frequency spacing. Using a frequency spacing of 10 Hz and a sampling frequency of 8 kHz, a 401 dimensional spectrum, covering 0–4 kHz, is obtained at each sampling instant. This is used as a feature vector to train an artificial neural network (ANN) model to discriminate (noisy) speech and nonspeech (mostly noise). The output of the trained ANN model for a given test utterance gives speech/nonspeech decision at every sampling instant. Post processing of the decision is used for SAD. The system generated SAD is evaluated on the Apollo corpus for SAD task in terms of detection cost function (DCF). The DCF values of the proposed system on the development and evaluation datasets are 3.1% and 4.6%, respectively, whereas the DCF values of the reported baseline system are 8.6% and 11.7%, respectively. … (more)
- Is Part Of:
- Computer speech & language. Volume 65(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 65(2021)
- Issue Display:
- Volume 65, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 65
- Issue:
- 2021
- Issue Sort Value:
- 2021-0065-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-01
- Subjects:
- ANN model -- Apollo corpus -- Noisy speech -- Single frequency filtering -- Speech activity detection
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101137 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16886.xml