A case study of Spanish text transformations for twitter sentiment analysis. (15th September 2017)
- Record Type:
- Journal Article
- Title:
- A case study of Spanish text transformations for twitter sentiment analysis. (15th September 2017)
- Main Title:
- A case study of Spanish text transformations for twitter sentiment analysis
- Authors:
- Tellez, Eric S.
Miranda-Jiménez, Sabino
Graff, Mario
Moctezuma, Daniela
Siordia, Oscar S.
Villaseñor, Elio A. - Abstract:
- Highlights: A review of popular techniques to model short texts written in an informal style. An analysis of configurations that produce the top-k sentiment classifiers. The analysis is oriented to the performance in both accuracy and computing time. A simple method to create fast and accurate sentiment analysis systems. Abstract: Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads. The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n -grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n -grams and character-based q -grams. TheHighlights: A review of popular techniques to model short texts written in an informal style. An analysis of configurations that produce the top-k sentiment classifiers. The analysis is oriented to the performance in both accuracy and computing time. A simple method to create fast and accurate sentiment analysis systems. Abstract: Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads. The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n -grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n -grams and character-based q -grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word-based combination by 11.17% and 5.62% on the INEGI and TASS'15 dataset, respectively. … (more)
- Is Part Of:
- Expert systems with applications. Volume 81(2017)
- Journal:
- Expert systems with applications
- Issue:
- Volume 81(2017)
- Issue Display:
- Volume 81, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 81
- Issue:
- 2017
- Issue Sort Value:
- 2017-0081-2017-0000
- Page Start:
- 457
- Page End:
- 471
- Publication Date:
- 2017-09-15
- Subjects:
- Sentiment analysis -- Error-robust text representations -- Opinion mining
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2017.03.071 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1557.xml