Chinese address standardisation via hybrid approach combining statistical and rule-based methods. (22nd October 2019)
- Record Type:
- Journal Article
- Title:
- Chinese address standardisation via hybrid approach combining statistical and rule-based methods. (22nd October 2019)
- Main Title:
- Chinese address standardisation via hybrid approach combining statistical and rule-based methods
- Authors:
- Chen, Xi
Fang, Cheng
Chang, Jasmine
Yang, Yanjiang
Hong, Yuan
Lu, Haibing - Abstract:
- This paper is derived from the research project of cleansing customer address data for the State Grid Corporation of China (SGCC), which is the largest electric utility company in the world and was ranked the 2nd in the 2016 Fortune Global 500. Address standardisation involves development of a standard address format for data integration, de-duplication, auto address correction/completion, and is widely considered as a very challenging data cleansing task. Address standardisation is critical for routine business tasks, customer relationship management, business intelligence for customer-oriented cooperates, and others. Address standardisation is particularly difficult for the Chinese language. The underlying reasons include: 1) the current address standard placed in China is only realised at the city/town level; 2) due to a number of reasons, many hand-written addresses are incomplete or contain errors; 3) it is difficult to process the Chinese language in a machine fashion due to the language. characteristics. To tackle challenges, we propose a hybrid approach combining both statistical and rule-based methods, which are the two mainstream address standardisation approaches. Our hybrid approach utilises the merits of the both methods and can complete the address standardisation task with a little human efforts and computational time, while achieving high accuracy.
- Is Part Of:
- International journal of internet and enterprise management. Volume 9:Number 2(2019)
- Journal:
- International journal of internet and enterprise management
- Issue:
- Volume 9:Number 2(2019)
- Issue Display:
- Volume 9, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 9
- Issue:
- 2
- Issue Sort Value:
- 2019-0009-0002-0000
- Page Start:
- 179
- Page End:
- 193
- Publication Date:
- 2019-10-22
- Subjects:
- natural languge processing -- Chinese address -- machine learning -- rule-based method
Internet -- Periodicals
Business information services -- Periodicals
Information storage and retrieval systems -- Management -- Periodicals
Business enterprises -- Computer networks -- Periodicals
004.678 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijiem ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1476-1300
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11615.xml