Twitter trolls: a linguistic profile of anti-democratic discourse. (May 2020)
- Record Type:
- Journal Article
- Title:
- Twitter trolls: a linguistic profile of anti-democratic discourse. (May 2020)
- Main Title:
- Twitter trolls: a linguistic profile of anti-democratic discourse
- Authors:
- Lundberg, Jonas
Laitinen, Mikko - Abstract:
- Abstract: This article focuses on anti-democratic discourse and investigates the linguistic profile of Twitter trolls. The troll data consist of some 3.5 million messages in English obtained through Twitter in late 2018. These data originate from potentially state-backed information operations aimed at sowing discord in Western societies. The baseline data, against which the troll data are compared, contain circa 4.4 million messages in English drawn from the Nordic Tweet Stream corpus. A machine learning application that enables us to select genuine personal messages in this corpus is used to prune the data. The empirical part investigates frequency-based characteristics of the two datasets. We utilize a set of automatically-extracted word-list information and the observed frequencies of personal pronouns. Our empirical findings show considerable quantitative differences so that the troll data are shorter, make use of a smaller number of lexical types and tokens, and resemble more formal registers, while the personal messages are more spoken-like. The results could be used to improve automated detection systems whose purpose is to identify troll accounts. Highlights: Corpus-linguistic analysis of known Twitter troll data. Baseline data consist of genuine personal outward Twitter communication. Methods use automatically-extracted word-list information and personal pronouns. Empirical part shows considerable differences between the datasets. Identifying trolls requiresAbstract: This article focuses on anti-democratic discourse and investigates the linguistic profile of Twitter trolls. The troll data consist of some 3.5 million messages in English obtained through Twitter in late 2018. These data originate from potentially state-backed information operations aimed at sowing discord in Western societies. The baseline data, against which the troll data are compared, contain circa 4.4 million messages in English drawn from the Nordic Tweet Stream corpus. A machine learning application that enables us to select genuine personal messages in this corpus is used to prune the data. The empirical part investigates frequency-based characteristics of the two datasets. We utilize a set of automatically-extracted word-list information and the observed frequencies of personal pronouns. Our empirical findings show considerable quantitative differences so that the troll data are shorter, make use of a smaller number of lexical types and tokens, and resemble more formal registers, while the personal messages are more spoken-like. The results could be used to improve automated detection systems whose purpose is to identify troll accounts. Highlights: Corpus-linguistic analysis of known Twitter troll data. Baseline data consist of genuine personal outward Twitter communication. Methods use automatically-extracted word-list information and personal pronouns. Empirical part shows considerable differences between the datasets. Identifying trolls requires socio-cultural information to complement data mining. … (more)
- Is Part Of:
- Language sciences. Volume 79(2020)
- Journal:
- Language sciences
- Issue:
- Volume 79(2020)
- Issue Display:
- Volume 79, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 79
- Issue:
- 2020
- Issue Sort Value:
- 2020-0079-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-05
- Subjects:
- Social media trolls -- Twitter -- Anti-democratization -- Discourse style -- Personal pronouns -- English as a lingua franca
Linguistics -- Periodicals
Language and languages -- Periodicals
Linguistique -- Périodiques
Langage et langues -- Périodiques
Language and languages
Linguistics
Periodicals
Electronic journals
405 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03880001 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.langsci.2019.101268 ↗
- Languages:
- English
- ISSNs:
- 0388-0001
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5155.711700
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13462.xml