Bengali paper classification using ensemble machine learning algorithms. (6th December 2022)
- Record Type:
- Journal Article
- Title:
- Bengali paper classification using ensemble machine learning algorithms. (6th December 2022)
- Main Title:
- Bengali paper classification using ensemble machine learning algorithms
- Authors:
- Khan, Niaz Ashraf
Zawad, Emrul Hasan
Rahman, Rashedur M. - Abstract:
- Text classification is one of the most challenging problems in natural language processing (NLP). Language models are at the heart of NLP. The ability to represent texts as numbers has given rise to many NLP tasks, for example, text categorisation, translation, and summarisation. Unfortunately, NLP for Bengali texts has not reached the state-of-art level of other Languages like English yet, mostly due to the scarcity of resources and the complexities seen in Bengali grammar. Therefore, not much work has been done in this field. In this paper, we have studied one of the word embedding methods, Word2vec, based on continuous bag of words (CBOW) with several ensemble machine learning algorithms, e.g., Adaptive Boosting Classifiers, Light Gradient Boosting Machine, XGboost, and random forest classifiers (RFC). The model is trained on a large corpus of Bengali newspapers of a considerable size that has 99283949 words and 8284804 sentences in 392772 documents. In our experiment, Word2vec CBOW model with XGboost algorithm performed much better than other models and achieved 92.24% accuracy.
- Is Part Of:
- International journal of knowledge engineering and soft data paradigms. Volume 7:Number 2(2022)
- Journal:
- International journal of knowledge engineering and soft data paradigms
- Issue:
- Volume 7:Number 2(2022)
- Issue Display:
- Volume 7, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 7
- Issue:
- 2
- Issue Sort Value:
- 2022-0007-0002-0000
- Page Start:
- 77
- Page End:
- 94
- Publication Date:
- 2022-12-06
- Subjects:
- NLP -- natural language processing -- categorisation -- document classification -- decision tree classifier
Soft computing -- Periodicals
Statistics -- Periodicals
Information science -- Periodicals
003.05 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijkesdp ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1755-3210
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 24713.xml