Spatial Information Extraction from Short Messages. (1st April 2018)
- Record Type:
- Journal Article
- Title:
- Spatial Information Extraction from Short Messages. (1st April 2018)
- Main Title:
- Spatial Information Extraction from Short Messages
- Authors:
- Zenasni, Sarah
Kergosien, Eric
Roche, Mathieu
Teisseire, Maguelonne - Abstract:
- Highlights: Proposition of original methods for Spatial Entity/Relation extraction. Extraction of new variations of spatial entities from SMS and tweets. Extraction of new spatial relations, and new variations of spatial relations. Our method achieved an F-measure score of 0.84 and 0.86 for SMS and tweets. Our method outperforms the state-of-the-art tools (i.e. Polyglot, Stanford NER). Abstract: Texts in addition to maps and satellite images, have become an important spatial data resource in recent years. Electronic written texts used in mediated interactions, especially short messages, have triggered the emergence of new ways of writing. Extracting information from such short messages, which represent a rich source of information, is highly important in order to discover domain-relevant information in the text and facilitate information retrieval. However, short messages are hard to analyse because of their brief, unstructured and informal nature. This paper focuses on the kinds of special or unique spatial entities and relations are contained in short messages. A new entity extraction method specifically dedicated to French short messages (SMS and tweets) is outlined to address this issue. The method is then tested on more traditional sources, like newspaper texts. This work is crucial in order to take advantage of the vast amount of geographical knowledge expressed in heterogeneous unstructured data. Firstly, we propose a process in which new spatial entities areHighlights: Proposition of original methods for Spatial Entity/Relation extraction. Extraction of new variations of spatial entities from SMS and tweets. Extraction of new spatial relations, and new variations of spatial relations. Our method achieved an F-measure score of 0.84 and 0.86 for SMS and tweets. Our method outperforms the state-of-the-art tools (i.e. Polyglot, Stanford NER). Abstract: Texts in addition to maps and satellite images, have become an important spatial data resource in recent years. Electronic written texts used in mediated interactions, especially short messages, have triggered the emergence of new ways of writing. Extracting information from such short messages, which represent a rich source of information, is highly important in order to discover domain-relevant information in the text and facilitate information retrieval. However, short messages are hard to analyse because of their brief, unstructured and informal nature. This paper focuses on the kinds of special or unique spatial entities and relations are contained in short messages. A new entity extraction method specifically dedicated to French short messages (SMS and tweets) is outlined to address this issue. The method is then tested on more traditional sources, like newspaper texts. This work is crucial in order to take advantage of the vast amount of geographical knowledge expressed in heterogeneous unstructured data. Firstly, we propose a process in which new spatial entities are extracted (e.g. motpellier, montpelier, Montpel are associated with Montpellier ). Secondly, we identify new spatial relations that precede spatial entities (e.g. sur, par ). Finally, we propose general patterns for the extraction of spatial relations. The task is very challenging and complex due to the specificity of short message language, which is based on weakly standardized modes of writing. The experiments were carried out on the three French corpora (i.e. 88milSMS, tweets, and Midi Libre) and highlight the efficiency of our proposal for identifying new kinds of spatial entities and relations. … (more)
- Is Part Of:
- Expert systems with applications. Volume 95(2018)
- Journal:
- Expert systems with applications
- Issue:
- Volume 95(2018)
- Issue Display:
- Volume 95, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 95
- Issue:
- 2018
- Issue Sort Value:
- 2018-0095-2018-0000
- Page Start:
- 351
- Page End:
- 367
- Publication Date:
- 2018-04-01
- Subjects:
- Spatial entities -- Spatial relations -- Similarity measures -- Short message corpora -- Text mining
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2017.11.025 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5493.xml