A Multiple-Layer Machine Learning Architecture for Improved Accuracy in Sentiment Analysis. (27th April 2019)
- Record Type:
- Journal Article
- Title:
- A Multiple-Layer Machine Learning Architecture for Improved Accuracy in Sentiment Analysis. (27th April 2019)
- Main Title:
- A Multiple-Layer Machine Learning Architecture for Improved Accuracy in Sentiment Analysis
- Authors:
- Shyamasundar, L B
Jhansi Rani, P - Abstract:
- Abstract: Twitter is an online micro-blogging platform through which one can explore the hidden valuable and delightful information about the current context at any point of time, which also serves as a data source to carry out sentiment analysis. In this paper, the sentiments of large amount of tweets generated from Twitter in the form of big data have been analyzed using machine learning algorithms. A multi-tier architecture for sentiment classification is proposed in this paper, which includes modules such as tokenization, data cleaning, preprocessing, stemming, updated lexicon, stopwords and emoticon dictionaries, feature selection and machine learning classifier. Unigram and bigrams have been used as feature extractors together with χ 2 (Chi-squared) and Singular Value Decomposition for dimensionality reduction together with two model types (Binary and Reg), with four types of scaling methods (No scaling, Standard, Signed and Unsigned) and represented them in three different vector formats (TF-IDF, Binary and Int). Accuracy is considered as the evaluation standard for random forest and bagged trees classification methods. Sentiments were analyzed through tokenization and having several stages of pre-processing and several combinations of feature vectors and classification methods. Through which it was possible to achieve an accuracy of 84.14%. Obtained results conclude that, the proposed scheme gives a better accuracy when compared with existing schemes in theAbstract: Twitter is an online micro-blogging platform through which one can explore the hidden valuable and delightful information about the current context at any point of time, which also serves as a data source to carry out sentiment analysis. In this paper, the sentiments of large amount of tweets generated from Twitter in the form of big data have been analyzed using machine learning algorithms. A multi-tier architecture for sentiment classification is proposed in this paper, which includes modules such as tokenization, data cleaning, preprocessing, stemming, updated lexicon, stopwords and emoticon dictionaries, feature selection and machine learning classifier. Unigram and bigrams have been used as feature extractors together with χ 2 (Chi-squared) and Singular Value Decomposition for dimensionality reduction together with two model types (Binary and Reg), with four types of scaling methods (No scaling, Standard, Signed and Unsigned) and represented them in three different vector formats (TF-IDF, Binary and Int). Accuracy is considered as the evaluation standard for random forest and bagged trees classification methods. Sentiments were analyzed through tokenization and having several stages of pre-processing and several combinations of feature vectors and classification methods. Through which it was possible to achieve an accuracy of 84.14%. Obtained results conclude that, the proposed scheme gives a better accuracy when compared with existing schemes in the literature. … (more)
- Is Part Of:
- Computer journal. Volume 63:Number 3(2020)
- Journal:
- Computer journal
- Issue:
- Volume 63:Number 3(2020)
- Issue Display:
- Volume 63, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 63
- Issue:
- 3
- Issue Sort Value:
- 2020-0063-0003-0000
- Page Start:
- 395
- Page End:
- 409
- Publication Date:
- 2019-04-27
- Subjects:
- Twitter -- pre-processing -- sentiment analysis -- machine learning -- dimensionality reduction
Computers -- Periodicals
005.1 - Journal URLs:
- http://comjnl.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/comjnl/bxz038 ↗
- Languages:
- English
- ISSNs:
- 0010-4620
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.060000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15705.xml