Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. (December 2020)
- Record Type:
- Journal Article
- Title:
- Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. (December 2020)
- Main Title:
- Exploiting semantic relationships for unsupervised expansion of sentiment lexicons
- Authors:
- Viegas, Felipe
Alvim, Mário S.
Canuto, Sérgio
Rosa, Thierson
Gonçalves, Marcos André
Rocha, Leonardo - Abstract:
- Abstract: The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be "close" in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage . We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performedAbstract: The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be "close" in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage . We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins. Highlights: We demonstrate that semantic relationships can be effective for lexicon expansion. We propose a novel method that explores distances between word embeddings. Our unsupervised method enhances lexicon coverage while keeping high precision. In our experiments, we beat all lexicon-expansion baselines by large margins. Our approach is competitive with pre-trained transformers (BERT) without any training. … (more)
- Is Part Of:
- Information systems. Volume 94(2020)
- Journal:
- Information systems
- Issue:
- Volume 94(2020)
- Issue Display:
- Volume 94, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 94
- Issue:
- 2020
- Issue Sort Value:
- 2020-0094-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12
- Subjects:
- Sentiment analysis -- Lexicon dictionary -- Word embeddings -- Lexicon expansion
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2020.101606 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14029.xml