Comparison of rule-based and data-driven approaches for syllabification of simple syllable languages and the effect of orthography. (November 2021)
- Record Type:
- Journal Article
- Title:
- Comparison of rule-based and data-driven approaches for syllabification of simple syllable languages and the effect of orthography. (November 2021)
- Main Title:
- Comparison of rule-based and data-driven approaches for syllabification of simple syllable languages and the effect of orthography
- Authors:
- Asahiah, Franklin Ọládiípọ̀
- Abstract:
- Highlights: The work investigated the performance of two approaches to syllabification for languages with simple syllable structure Two languages: Igbo and Yorúbá, were chosen as case studies for this investigation The phonemic and orthographic inventories of the two languages were compared Rule-based algorithms for syllabification of the two languages were designed and implemented. A machine learner TiMBL (Tilburg Memory-Based Learner) that implement k-nearest neighbor form of data-driven learning was selected for data-driven approach. This is quite similar to the algorithm used in Adsett et al. (2009). We created n-gram models of orders 3 to 9 for the data-driven models, and measured the impact of addition of linguistic information. Unique to this study is the measure of the impact of preprocessing the input for digraphs that can be deterministically identified and replaced with a single letter substitute. Our study showed that the rule-based approaches is not inferior to data-driven approaches when the syllabication rules are well designed for languages with simple syllable Our study also showed that syllabication accuracy is likely to be related to orthography in the sense that the shallower the orthography of a language is, the more accurate the syllabication. We also noted that the more deterministic it is to convert multigraphs and diagraphs to single letters as a preprocessing step, the greater the likelihood of better syllabication. Abstract: SyllabicationHighlights: The work investigated the performance of two approaches to syllabification for languages with simple syllable structure Two languages: Igbo and Yorúbá, were chosen as case studies for this investigation The phonemic and orthographic inventories of the two languages were compared Rule-based algorithms for syllabification of the two languages were designed and implemented. A machine learner TiMBL (Tilburg Memory-Based Learner) that implement k-nearest neighbor form of data-driven learning was selected for data-driven approach. This is quite similar to the algorithm used in Adsett et al. (2009). We created n-gram models of orders 3 to 9 for the data-driven models, and measured the impact of addition of linguistic information. Unique to this study is the measure of the impact of preprocessing the input for digraphs that can be deterministically identified and replaced with a single letter substitute. Our study showed that the rule-based approaches is not inferior to data-driven approaches when the syllabication rules are well designed for languages with simple syllable Our study also showed that syllabication accuracy is likely to be related to orthography in the sense that the shallower the orthography of a language is, the more accurate the syllabication. We also noted that the more deterministic it is to convert multigraphs and diagraphs to single letters as a preprocessing step, the greater the likelihood of better syllabication. Abstract: Syllabication algorithms have been developed for many languages because of the important role that syllables play in language processing. However, efforts continue to find better methods. In this study, we take a second look at the performance of two major approaches to syllabication to help determine their relationship to the structure of languages and their orthographies. We applied syllabification to two languages, Igbo and Yoruba, with simple syllable structure. The paper shows that the rule-based approach is not inferior to the data-driven approach as earlier reported for syllables with more complex structures. In addition, the results from the two languages showed that syllabication performance is affected to a certain extent by the orthography of the language in question independent of the structure of the language. … (more)
- Is Part Of:
- Computer speech & language. Volume 70(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 70(2021)
- Issue Display:
- Volume 70, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 70
- Issue:
- 2021
- Issue Sort Value:
- 2021-0070-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Syllables -- Division -- Rules -- data -- Approach -- Comparison -- Orthographies
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101233 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17252.xml