Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods. (August 2019)
- Record Type:
- Journal Article
- Title:
- Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods. (August 2019)
- Main Title:
- Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods
- Authors:
- Ardhana, A P
Cahyani, D E
Winarno, - Abstract:
- Abstract: Javanese language articles can now be found in electronic media. But for people who are just learning the Javanese language, they have difficulty in knowing the language level that is contained in the article because there is no explanation regarding the type of language level. For this reason, it is necessary to classify the language level in Javanese language based on the article, where the Javanese language is divided into 4 levels, namely ngoko, ngoko alus, krama madya, and krama alus. Before beginning the classification process, the data must be preprocessed. One of the steps in preprocessing method is stemming, which is used to convert the affixes in words into basic words. The Javanese stemming process in this research refers to the Indonesian stemming rules based on the adjusted Nazief-Adriani algorithm. The extraction feature process was carried out using N-Gram with n = 2, 3, 4 (bigram, trigram, and quadgram). After finishing the preprocessing method, the classification process then was executed using Multinomial Naïve Bayes method. In the classification process, there often exist problems related to imbalance data between categories. To overcome this problem, SMOTE resampling method is utilized to balance the data. The classification with N-Gram variations accompanied by SMOTE and using stemming results to maximum accuracy of 67.01% at n = 2, then for the highest precision was at N-Gram n = 4 with condition not using stemming but using SMOTE at valueAbstract: Javanese language articles can now be found in electronic media. But for people who are just learning the Javanese language, they have difficulty in knowing the language level that is contained in the article because there is no explanation regarding the type of language level. For this reason, it is necessary to classify the language level in Javanese language based on the article, where the Javanese language is divided into 4 levels, namely ngoko, ngoko alus, krama madya, and krama alus. Before beginning the classification process, the data must be preprocessed. One of the steps in preprocessing method is stemming, which is used to convert the affixes in words into basic words. The Javanese stemming process in this research refers to the Indonesian stemming rules based on the adjusted Nazief-Adriani algorithm. The extraction feature process was carried out using N-Gram with n = 2, 3, 4 (bigram, trigram, and quadgram). After finishing the preprocessing method, the classification process then was executed using Multinomial Naïve Bayes method. In the classification process, there often exist problems related to imbalance data between categories. To overcome this problem, SMOTE resampling method is utilized to balance the data. The classification with N-Gram variations accompanied by SMOTE and using stemming results to maximum accuracy of 67.01% at n = 2, then for the highest precision was at N-Gram n = 4 with condition not using stemming but using SMOTE at value 72.67%. The highest recall value was obtained in two conditions, namely at N-Gram n = 2 either using or not using SMOTE and use stemming, which is 67.00%. From the results of this study, it can be concluded that stemming rules which are adapted from Nazief-Adriani algorithm with the addition of stemming steps and affix list from Javanese language expert can be implemented properly for Javanese language stemming process. And the classification of the Javanese Language level using Multinomial Naïve Bayes and N-Gram methods can result in good enough accuracy, precision, and recall. … (more)
- Is Part Of:
- Journal of physics. Volume 1306(2019)
- Journal:
- Journal of physics
- Issue:
- Volume 1306(2019)
- Issue Display:
- Volume 1306, Issue 1 (2019)
- Year:
- 2019
- Volume:
- 1306
- Issue:
- 1
- Issue Sort Value:
- 2019-1306-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-08
- Subjects:
- Physics -- Congresses
530.5 - Journal URLs:
- http://www.iop.org/EJ/journal/1742-6596 ↗
http://ioppublishing.org/ ↗ - DOI:
- 10.1088/1742-6596/1306/1/012049 ↗
- Languages:
- English
- ISSNs:
- 1742-6588
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5036.223000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11885.xml