Classification of aspirated and unaspirated sounds in speech using excitation and signal level information. (July 2020)
- Record Type:
- Journal Article
- Title:
- Classification of aspirated and unaspirated sounds in speech using excitation and signal level information. (July 2020)
- Main Title:
- Classification of aspirated and unaspirated sounds in speech using excitation and signal level information
- Authors:
- Ramteke, Pravin Bhaskar
Supanekar, Sujata
Koolagudi, Shashidhar G. - Abstract:
- Highlights: In this work, an attempt has been made to study the consonant aspiration and unaspiration phenomena. Main contribution of the work is the features, extracted from burst region of the consonants while exhalation of air during the pronunciation of aspirated and unaspirated sounds. The effect of exhalation of air on the pronunciation of aspirated and unaspirated sounds is analysed using glottal volume velocity waveform; that gives a measure of open, closed and return phases of vocal folds' during aspiration and unaspiration. During aspiration, most of the energy or stress is put to exhale the air out of lungs. This exhalation reduces the strength available for the production of the following vowel after the aspiration. Due to this, the strength of vocal folds' vibration immediately following the aspiration is weak and low. This results in longer open, closed and return phases of vocal folds' vibration. In the case of unaspirated sounds, very low volume of air is exhaled during release of constriction. Hence, enough strength is available for vocal folds' vibration during pronunciation of immediately following vowel. It leads to comparatively high rate of vocal activity where one can observe sharper and sudden opening of vocal folds, sharp return of vocal folds to the closed phase and very less duration of closed phase. Hence these observations are considered to extract features in the proposed approach: duration of opening phase, return phase and closed phase alongHighlights: In this work, an attempt has been made to study the consonant aspiration and unaspiration phenomena. Main contribution of the work is the features, extracted from burst region of the consonants while exhalation of air during the pronunciation of aspirated and unaspirated sounds. The effect of exhalation of air on the pronunciation of aspirated and unaspirated sounds is analysed using glottal volume velocity waveform; that gives a measure of open, closed and return phases of vocal folds' during aspiration and unaspiration. During aspiration, most of the energy or stress is put to exhale the air out of lungs. This exhalation reduces the strength available for the production of the following vowel after the aspiration. Due to this, the strength of vocal folds' vibration immediately following the aspiration is weak and low. This results in longer open, closed and return phases of vocal folds' vibration. In the case of unaspirated sounds, very low volume of air is exhaled during release of constriction. Hence, enough strength is available for vocal folds' vibration during pronunciation of immediately following vowel. It leads to comparatively high rate of vocal activity where one can observe sharper and sudden opening of vocal folds, sharp return of vocal folds to the closed phase and very less duration of closed phase. Hence these observations are considered to extract features in the proposed approach: duration of opening phase, return phase and closed phase along with their statistical variations are the main contribution of our work. Along with the proposed features, signal level features are also considered which capture the information about vocal activity time to attain steady vowel region (rate of rise in the signal strength during consonant to vowel transition region), VOT and properties of consonant burst regions. Three datasets namely TIMIT, IIIT Hyderabad Marathi and IIIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to test the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFNNs) are used as classifiers to prove the effectiveness of the features used for the task. Correlation based feature selection is performed to select the optimal set of features from the proposed features. Selected features are evaluated on the Random forest, support vector machine and deep feed forward neural networks (DFNNs) classifiers. The selected features achieved performance equivalent to the performance of the complete feature set. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system. Abstract: In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the 'puff of air' released at the place of constriction in the vocal tract also known as burst. Here, properties of the vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from speech as low pass filtered linear prediction residual signal is used for the task. The signal characteristics of parameters such as glottal pulse, duration of open, closed & return phases; slope of open, & return phases; duration of burst; ratio of highest and lowest frame wise energies of signal and voice onset point are explored as features to characterize aspiration and unaspiration. Three datasets namely TIMIT, IIIT Hyderabad Marathi and IIIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to verify the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFFNNs) are used as classifiers to test the effectiveness of the features used for the task. Optimal features are selected for the classification using correlation based feature selection (CFS). From the results, it is observed that the proposed features are efficient in classifying the aspirated and unaspirated consonants. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system. … (more)
- Is Part Of:
- Computer speech & language. Volume 62(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 62(2020)
- Issue Display:
- Volume 62, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 62
- Issue:
- 2020
- Issue Sort Value:
- 2020-0062-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-07
- Subjects:
- Aspiration -- Excitation source signal -- Glottal pulse features -- Hidden markov model -- Linear prediction residual -- Random forest -- Support vector machine -- Unaspiration
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.101057 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12956.xml