Analysis of structured data on Wikipedia. (10th August 2021)
- Record Type:
- Journal Article
- Title:
- Analysis of structured data on Wikipedia. (10th August 2021)
- Main Title:
- Analysis of structured data on Wikipedia
- Authors:
- Moreira, Johny
Neto, Everaldo Costa
Barbosa, Luciano - Abstract:
- Wikipedia has been widely used for information consumption or for implementing solutions using its content. It contains primarily unstructured text about entities, but it can also contain infoboxes, which are structured attributes describing these entities. Owing to its structural nature, infoboxes have been shown useful to many applications. In this work, we perform an extensive data analysis on different aspects of Wikipedia structured data: infoboxes, templates and categories, aiming to uncover data issues and limitations, and to guide researchers in the use of these structured data. We devise a framework to process, index and query the Wikipedia data, using it to analyse different scenarios such as the popularity of infoboxes, their size distribution and usage across categories. Some of our findings are: only 54% of Wikipedia articles have infoboxes; there is a considerable amount of geographical and temporal information in infoboxes; and there is great heterogeneity of infoboxes across a same category.
- Is Part Of:
- International journal of metadata, semantics and ontologies. Volume 15:Number 1(2021)
- Journal:
- International journal of metadata, semantics and ontologies
- Issue:
- Volume 15:Number 1(2021)
- Issue Display:
- Volume 15, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 15
- Issue:
- 1
- Issue Sort Value:
- 2021-0015-0001-0000
- Page Start:
- 71
- Page End:
- 86
- Publication Date:
- 2021-08-10
- Subjects:
- metadata -- knowledge management -- structured data -- data analysis -- Wikipedia -- infoboxes -- indexing strategy -- categories -- templates -- entities
Metadata -- Periodicals
Semantic Web -- Periodicals
Ontologies (Information retrieval) -- Periodicals
Data structures (Computer science) -- Periodicals
Information theory -- Periodicals
005.74 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalID=152 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1744-2621
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 16265.xml