A general framework for big data knowledge discovery and integration. (25th January 2018)
- Record Type:
- Journal Article
- Title:
- A general framework for big data knowledge discovery and integration. (25th January 2018)
- Main Title:
- A general framework for big data knowledge discovery and integration
- Authors:
- Wang, Xinyang
Qi, Deyu
Lin, Weiwei
Yu, MinCong
Zheng, Zhishuo
Zhou, Naqin
Chen, Pengguang - Abstract:
- Summary: Data structure description, conceptual modeling, and logic reasoning for knowledge discovery are three critical factors for the integration of information with heterogeneity. In particular, technologies of NoSQL databases and Internet of Things raise an urgent requirement for a uniform expression of heterogeneous data, and little attention has been paid to researches on the integration of NoSQL databases with traditional data models, as well as the semantic description of big data. To tackle these problems, in this paper, a concept‐and‐relation‐oriented grid data model called GODM model is first proposed based on the definitions of Monad, Compounder, Relation, etc. Then, the GODM model is utilized to uniformly describe traditional data models and NoSQL data models, which eliminates structure differences of heterogeneous data. Next, based on the GODM relation mechanism, an extendable semantic system is built up by choosing SHOIQ(D) description logic as the example to establish the correspondence with GODM grammar subset, providing a fundamental support for semantic integration and knowledge discovery of heterogeneous data. After that, comprehensive comparisons with GODM and other models are made, especially the distinctions between GODM and OWL on the aspects of relation mechanism, hybrid schema, description logic, grammatical constructors, etc. Besides, experimental evaluations and analyses on time and space efficiencies of some primary common data models areSummary: Data structure description, conceptual modeling, and logic reasoning for knowledge discovery are three critical factors for the integration of information with heterogeneity. In particular, technologies of NoSQL databases and Internet of Things raise an urgent requirement for a uniform expression of heterogeneous data, and little attention has been paid to researches on the integration of NoSQL databases with traditional data models, as well as the semantic description of big data. To tackle these problems, in this paper, a concept‐and‐relation‐oriented grid data model called GODM model is first proposed based on the definitions of Monad, Compounder, Relation, etc. Then, the GODM model is utilized to uniformly describe traditional data models and NoSQL data models, which eliminates structure differences of heterogeneous data. Next, based on the GODM relation mechanism, an extendable semantic system is built up by choosing SHOIQ(D) description logic as the example to establish the correspondence with GODM grammar subset, providing a fundamental support for semantic integration and knowledge discovery of heterogeneous data. After that, comprehensive comparisons with GODM and other models are made, especially the distinctions between GODM and OWL on the aspects of relation mechanism, hybrid schema, description logic, grammatical constructors, etc. Besides, experimental evaluations and analyses on time and space efficiencies of some primary common data models are conducted after the proposal of a general evaluation model, with the results showing that the GODM model has great advantage on properties of expressiveness, flexibility, etc, particularly time and space efficiency. In summary, the GODM model describes heterogeneous data from both aspects of data structure and semantic relationship and realizes a hybrid schema reconciling the schemaful and schemaless data models, making it especially suitable for dynamic data integration and knowledge discovery from big data models. … (more)
- Is Part Of:
- Concurrency and computation. Volume 30:Number 13(2018)
- Journal:
- Concurrency and computation
- Issue:
- Volume 30:Number 13(2018)
- Issue Display:
- Volume 30, Issue 13 (2018)
- Year:
- 2018
- Volume:
- 30
- Issue:
- 13
- Issue Sort Value:
- 2018-0030-0013-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2018-01-25
- Subjects:
- data integration -- data model -- GODM -- hybrid schema -- knowledge representation -- NoSQL -- time and space efficiency
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.4422 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 6821.xml