Sequential use of spectral models to reduce deletion and insertion errors in vowel detection. (July 2018)
- Record Type:
- Journal Article
- Title:
- Sequential use of spectral models to reduce deletion and insertion errors in vowel detection. (July 2018)
- Main Title:
- Sequential use of spectral models to reduce deletion and insertion errors in vowel detection
- Authors:
- Kashani, Hamidreza Baradaran
Sayadiyan, Abolghasem - Abstract:
- Highlights: A novel vowel detection framework is proposed that directly addresses three possible errors in the vowel detection problem. The proposed framework sequentially employs GMM and SVM-based spectral models for reducing the three errors of vowel deletion, consonant insertion, and vowel insertion. The criteria of vowel deletion error, total error, and F-measure are significantly improved by the proposed framework on three different corpora, namely FARSDAT, TIMIT, and TFARSDAT. Abstract: From both perspectives of speech production and speech perception, vowels as syllable nuclei can be considered as the most significant speech events. Detection of vowel events from a speech signal is usually performed by a two-step procedure. First, a temporal objective contour (TOC), as a time-varying measure of vowel similarity, is generated from the speech signal. Second, vowel landmarks, as the places of vowel events, are extracted by locating prominent peaks of the TOC. In this paper, by employing some spectral models in a sequential manner, we propose a new framework that directly addresses three possible errors in the vowel detection problem, namely vowel deletion, consonant insertion, and vowel insertion. The proposed framework consists of three main steps as follows. At the first step, two solutions are proposed to essentially reduce the initial vowel deletion error. The first solution is to use the peaks detected by a conventional energy-based TOC, but without utilizing TOCHighlights: A novel vowel detection framework is proposed that directly addresses three possible errors in the vowel detection problem. The proposed framework sequentially employs GMM and SVM-based spectral models for reducing the three errors of vowel deletion, consonant insertion, and vowel insertion. The criteria of vowel deletion error, total error, and F-measure are significantly improved by the proposed framework on three different corpora, namely FARSDAT, TIMIT, and TFARSDAT. Abstract: From both perspectives of speech production and speech perception, vowels as syllable nuclei can be considered as the most significant speech events. Detection of vowel events from a speech signal is usually performed by a two-step procedure. First, a temporal objective contour (TOC), as a time-varying measure of vowel similarity, is generated from the speech signal. Second, vowel landmarks, as the places of vowel events, are extracted by locating prominent peaks of the TOC. In this paper, by employing some spectral models in a sequential manner, we propose a new framework that directly addresses three possible errors in the vowel detection problem, namely vowel deletion, consonant insertion, and vowel insertion. The proposed framework consists of three main steps as follows. At the first step, two solutions are proposed to essentially reduce the initial vowel deletion error. The first solution is to use the peaks detected by a conventional energy-based TOC, but without utilizing TOC smoothing and peak thresholding processes. The peaks detected by a spectral-based TOC generated on the basis of GMM models are also put forward as the second solution for achieving a smaller vowel deletion error. At the second step, a two-class support vector machine (SVM) classifier is adopted to identify the consonant peaks from the vowel ones. Removing the peaks classified as consonants reduces the consonant insertion error. Finally, a two-class SVM classifier is proposed to classify the consecutive peaks detected within the same vowel from the others. The merging of the peaks classified as "same vowel" considerably reduces the vowel insertion error. Experiments are separately conducted on three standard speech corpora, namely FARSDAT, TIMIT and TFARSDAT. The effectiveness of the techniques proposed to reduce three types of detection errors is verified. The criteria of total error (as the summation of three detection errors) and F-measure, respectively result in about 9.7% and 95.1% for FARSDAT, 17.5% and 91.3% for TIMIT, and 19.6% and 90.2% for the TFARSDAT corpus. The evaluation results show that the proposed framework outperforms the existing well-known methods in terms of both total error and F-measure on both read and spontaneous speech corpora. … (more)
- Is Part Of:
- Computer speech & language. Volume 50(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 50(2018)
- Issue Display:
- Volume 50, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 50
- Issue:
- 2018
- Issue Sort Value:
- 2018-0050-2018-0000
- Page Start:
- 105
- Page End:
- 125
- Publication Date:
- 2018-07
- Subjects:
- Vowel landmark detection -- Temporal objective contour (TOC) -- Vowel deletion error -- Consonant insertion error -- Vowel insertion error
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.12.008 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6115.xml