Crowdsourcing the character of a place: Character‐level convolutional networks for multilingual geographic text classification. Issue 2 (29th January 2018)
- Record Type:
- Journal Article
- Title:
- Crowdsourcing the character of a place: Character‐level convolutional networks for multilingual geographic text classification. Issue 2 (29th January 2018)
- Main Title:
- Crowdsourcing the character of a place: Character‐level convolutional networks for multilingual geographic text classification
- Authors:
- Adams, Benjamin
McKenzie, Grant - Abstract:
- Abstract: This article presents a new character‐level convolutional neural network model that can classify multilingual text written using any character set that can be encoded with UTF‐8, a standard and widely used 8‐bit character encoding. For geographic classification of text, we demonstrate that this approach is competitive with state‐of‐the‐art word‐based text classification methods. The model was tested on four crowdsourced data sets made up of Wikipedia articles, online travel blogs, Geonames toponyms, and Twitter posts. Unlike word‐based methods, which require data cleaning and pre‐processing, the proposed model works for any language without modification and with classification accuracy comparable to existing methods. Using a synthetic data set with introduced character‐level errors, we show it is more robust to noise than word‐level classification algorithms. The results indicate that UTF‐8 character‐level convolutional neural networks are a promising technique for georeferencing noisy text, such as found in colloquial social media posts and texts scanned with optical character recognition. However, word‐based methods currently require less computation time to train, so currently are preferable for classifying well‐formatted and cleaned texts in single languages.
- Is Part Of:
- Transactions in GIS. Volume 22:Issue 2(2018)
- Journal:
- Transactions in GIS
- Issue:
- Volume 22:Issue 2(2018)
- Issue Display:
- Volume 22, Issue 2 (2018)
- Year:
- 2018
- Volume:
- 22
- Issue:
- 2
- Issue Sort Value:
- 2018-0022-0002-0000
- Page Start:
- 394
- Page End:
- 408
- Publication Date:
- 2018-01-29
- Subjects:
- Geographic information systems -- Periodicals
910.285 - Journal URLs:
- http://www.blackwell-synergy.com/servlet/useragent?func=showIssues&code=tgis ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/tgis.12317 ↗
- Languages:
- English
- ISSNs:
- 1361-1682
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 9020.502000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6405.xml