Multiple time-instances features based approach for reference-free speech quality measurement. (April 2023)
- Record Type:
- Journal Article
- Title:
- Multiple time-instances features based approach for reference-free speech quality measurement. (April 2023)
- Main Title:
- Multiple time-instances features based approach for reference-free speech quality measurement
- Authors:
- Jaiswal, Rahul Kumar
Dubey, Rajesh Kumar - Abstract:
- Abstract: This paper investigates the problem of measuring speech quality of received speech signal without employing the original speech signal. The problem of deterioration of the speech quality arises due to noise present in the surroundings. To this line, we propose a multiple time-instances (MTI) features-based approach for reference-free speech quality measurement model. A voice activity detector (VAD) is exploited primarily for calculating the number of active speech chunks of a speech signal. For these chunks and their successive combinations called here batches, multi-resolution auditory model (MRAM), mel-frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) features are extracted and called as MTI features. It is hypothesized that the MTI features are capable in capturing the distortions caused by time-localized effects of short-time transients, impulsive noise, and its differences from the plosive sounds. The MTI metric estimates (MTI-ME) are calculated corresponding to these MTI features employing the Gaussian mixture model (GMM) probabilistic technique. The overall objective speech quality of a speech signal is then determined as a linear combination of optimally weighted MTI-ME corresponding to distinct active speech chunks and their successive combinations, that is, batches of that speech signal. Minimum mean square error criterion or Pearson's correlation maximization criterion is employed for computing optimal weights. In addition, aAbstract: This paper investigates the problem of measuring speech quality of received speech signal without employing the original speech signal. The problem of deterioration of the speech quality arises due to noise present in the surroundings. To this line, we propose a multiple time-instances (MTI) features-based approach for reference-free speech quality measurement model. A voice activity detector (VAD) is exploited primarily for calculating the number of active speech chunks of a speech signal. For these chunks and their successive combinations called here batches, multi-resolution auditory model (MRAM), mel-frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) features are extracted and called as MTI features. It is hypothesized that the MTI features are capable in capturing the distortions caused by time-localized effects of short-time transients, impulsive noise, and its differences from the plosive sounds. The MTI metric estimates (MTI-ME) are calculated corresponding to these MTI features employing the Gaussian mixture model (GMM) probabilistic technique. The overall objective speech quality of a speech signal is then determined as a linear combination of optimally weighted MTI-ME corresponding to distinct active speech chunks and their successive combinations, that is, batches of that speech signal. Minimum mean square error criterion or Pearson's correlation maximization criterion is employed for computing optimal weights. In addition, a deep neural network (DNN)-based speech quality model is also developed for calculating a single objective speech quality while considering all active speech chunks together. Pearson's correlation coefficient and weighted average correlation are exploited for evaluating the performance. Results demonstrate that the proposed model achieves promising improvement over the standard speech quality model (P.563) and improves correlation values by around 37%. Highlights: Addressing speech quality measuring problem by developing a reference-free speech quality model. Developing a feature extraction technique incorporating distinct auditory features. Developing a joint GMM training algorithm for computing objective speech quality. Developing a deep neural network (DNN) framework for validating the proposed algorithm. Demonstrating better performance as compared to the standard speech quality model (P.563). … (more)
- Is Part Of:
- Computer speech & language. Volume 79(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 79(2023)
- Issue Display:
- Volume 79, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 79
- Issue:
- 2023
- Issue Sort Value:
- 2023-0079-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Processed speech -- Voice activity detector -- Multi-resolution auditory model -- Multiple time-instances features -- Deep neural network -- Reference-free speech quality
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101478 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25994.xml