Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory. (November 2021)
- Record Type:
- Journal Article
- Title:
- Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory. (November 2021)
- Main Title:
- Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory
- Authors:
- Mahmudi, Aso
Veisi, Hadi - Abstract:
- Highlights: Reviewed the phonology and alphabet of the Central Kurdish, a low-resourced language. Proposed an adequate rule-based method for G2P conversion of the Central Kurdish. Specified and ranked phonological constraints in the framework of Optimality Theory. Showed that the current alphabet of Central Kurdish does not need additional letters. Abstract: The writing system of Central Kurdish features three cases in which there is no one-to-one mapping between the orthographical letters and the phonemes of the language. Consequently, the written words including these cases may be pronounced in multiple ways. The process of finding the correct pronunciation of written words is called Grapheme-to-Phoneme (G2P) conversion and is a key step in natural language processing tasks such as speech synthesis. As Central Kurdish is a low-resourced language, we present a G2P conversion method based on the phonological rules of the language, rather than pronunciation dictionaries and data-driven learning methods. After reviewing the phonology and alphabet of the language through the framework of Optimality Theory, we generate all possible pronunciations. Then, by specifying and applying ranked constraints, we eliminate undesirable candidates so as to keep only one well-formed pronunciation per word. The evaluation of our proposed method on two datasets resulted in 0.75% of overall Phoneme Error Rate (PER) and achieved 94.71% precision in the detection of the short vowel /i/ and 100% ofHighlights: Reviewed the phonology and alphabet of the Central Kurdish, a low-resourced language. Proposed an adequate rule-based method for G2P conversion of the Central Kurdish. Specified and ranked phonological constraints in the framework of Optimality Theory. Showed that the current alphabet of Central Kurdish does not need additional letters. Abstract: The writing system of Central Kurdish features three cases in which there is no one-to-one mapping between the orthographical letters and the phonemes of the language. Consequently, the written words including these cases may be pronounced in multiple ways. The process of finding the correct pronunciation of written words is called Grapheme-to-Phoneme (G2P) conversion and is a key step in natural language processing tasks such as speech synthesis. As Central Kurdish is a low-resourced language, we present a G2P conversion method based on the phonological rules of the language, rather than pronunciation dictionaries and data-driven learning methods. After reviewing the phonology and alphabet of the language through the framework of Optimality Theory, we generate all possible pronunciations. Then, by specifying and applying ranked constraints, we eliminate undesirable candidates so as to keep only one well-formed pronunciation per word. The evaluation of our proposed method on two datasets resulted in 0.75% of overall Phoneme Error Rate (PER) and achieved 94.71% precision in the detection of the short vowel /i/ and 100% of accuracy in the conversion of the letters "ی" and "و". Analyzing these results suggests that there is no need for additional new letters in the current orthographic system of Central Kurdish. This approach also enables us to have a ranked suggestion list for the manual checking of the few unresolved ambiguous situations. … (more)
- Is Part Of:
- Computer speech & language. Volume 70(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 70(2021)
- Issue Display:
- Volume 70, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 70
- Issue:
- 2021
- Issue Sort Value:
- 2021-0070-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Grapheme-to-phoneme conversion -- Optimality Theory -- Central Kurdish -- Kurdish phonology
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101222 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17252.xml