Influence of social conversational features on language identification in highly multilingual online conversations. Issue 1 (January 2019)
- Record Type:
- Journal Article
- Title:
- Influence of social conversational features on language identification in highly multilingual online conversations. Issue 1 (January 2019)
- Main Title:
- Influence of social conversational features on language identification in highly multilingual online conversations
- Authors:
- Sarma, Neelakshi
Singh, Sanasam Ranbir
Goswami, Diganta - Abstract:
- Highlights: ALI in highly multilingual conversations on social media platform. Text refinement using social conversational features. Incorporated proposed text refinement with various classification methods. Created new datasets. Abstract: With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinementHighlights: ALI in highly multilingual conversations on social media platform. Text refinement using social conversational features. Incorporated proposed text refinement with various classification methods. Created new datasets. Abstract: With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using various sentence level language identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts. … (more)
- Is Part Of:
- Information processing & management. Volume 56:Issue 1(2019:Jan.)
- Journal:
- Information processing & management
- Issue:
- Volume 56:Issue 1(2019:Jan.)
- Issue Display:
- Volume 56, Issue 1 (2019)
- Year:
- 2019
- Volume:
- 56
- Issue:
- 1
- Issue Sort Value:
- 2019-0056-0001-0000
- Page Start:
- 151
- Page End:
- 166
- Publication Date:
- 2019-01
- Subjects:
- Language identification -- Multilingual -- Social conversational features -- Convolutional neural network
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2018.09.009 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9140.xml