Classifying natural-language spatial relation terms with random forest algorithm. Issue 3 (4th March 2017)
- Record Type:
- Journal Article
- Title:
- Classifying natural-language spatial relation terms with random forest algorithm. Issue 3 (4th March 2017)
- Main Title:
- Classifying natural-language spatial relation terms with random forest algorithm
- Authors:
- Du, Shihong
Wang, Xiaonan
Feng, Chen-Chieh
Zhang, Xiuyuan - Abstract:
- ABSTRACT: The exponential growth of natural language text data in social media has contributed a rich data source for geographic information. However, incorporating such data source for GIS analysis faces tremendous challenges as existing GIS data tend to be geometry based while natural language text data tend to rely on natural language spatial relation (NLSR) terms. To alleviate this problem, one critical step is to translate geometric configurations into NLSR terms, but existing methods to date (e.g. mean value or decision tree algorithm) are insufficient to obtain a precise translation. This study addresses this issue by adopting the random forest (RF) algorithm to automatically learn a robust mapping model from a large number of samples and to evaluate the importance of each variable for each NLSR term. Because the semantic similarity of the collected terms reduces the classification accuracy, different grouping schemes of NLSR terms are used, with their influences on classification results being evaluated. The experiment results demonstrate that the learned model can accurately transform geometric configurations into NLSR terms, and that recognizing different groups of terms require different sets of variables. More importantly, the results of variable importance evaluation indicate that the importance of topology types determined by the 9-intersection model is weaker than metric variables in defining NLSR terms, which contrasts to the assertion of 'topology matters,ABSTRACT: The exponential growth of natural language text data in social media has contributed a rich data source for geographic information. However, incorporating such data source for GIS analysis faces tremendous challenges as existing GIS data tend to be geometry based while natural language text data tend to rely on natural language spatial relation (NLSR) terms. To alleviate this problem, one critical step is to translate geometric configurations into NLSR terms, but existing methods to date (e.g. mean value or decision tree algorithm) are insufficient to obtain a precise translation. This study addresses this issue by adopting the random forest (RF) algorithm to automatically learn a robust mapping model from a large number of samples and to evaluate the importance of each variable for each NLSR term. Because the semantic similarity of the collected terms reduces the classification accuracy, different grouping schemes of NLSR terms are used, with their influences on classification results being evaluated. The experiment results demonstrate that the learned model can accurately transform geometric configurations into NLSR terms, and that recognizing different groups of terms require different sets of variables. More importantly, the results of variable importance evaluation indicate that the importance of topology types determined by the 9-intersection model is weaker than metric variables in defining NLSR terms, which contrasts to the assertion of 'topology matters, metric refines' in existing studies. … (more)
- Is Part Of:
- International journal of geographical information science. Volume 31:Issue 3(2017)
- Journal:
- International journal of geographical information science
- Issue:
- Volume 31:Issue 3(2017)
- Issue Display:
- Volume 31, Issue 3 (2017)
- Year:
- 2017
- Volume:
- 31
- Issue:
- 3
- Issue Sort Value:
- 2017-0031-0003-0000
- Page Start:
- 542
- Page End:
- 568
- Publication Date:
- 2017-03-04
- Subjects:
- Natural-language spatial relations -- topological terms -- metric variables -- random forest -- social media data -- geographical information retrieval
Geography -- Data processing -- Periodicals
Information storage and retrieval systems -- Periodicals
Géomatique -- Périodiques
Systèmes d'information -- Périodiques
910.285 - Journal URLs:
- http://www.tandfonline.com/loi/tgis20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/13658816.2016.1212356 ↗
- Languages:
- English
- ISSNs:
- 1365-8816
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.266150
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1061.xml