A sensitivity analysis of factors influential to the popularity of shared data in data repositories. Issue 3 (August 2021)
- Record Type:
- Journal Article
- Title:
- A sensitivity analysis of factors influential to the popularity of shared data in data repositories. Issue 3 (August 2021)
- Main Title:
- A sensitivity analysis of factors influential to the popularity of shared data in data repositories
- Authors:
- Xie, Qing
Wang, Jiamin
Kim, Giyeong
Lee, Soobin
Song, Min - Abstract:
- Highlights: This paper used a neural network based method to analyze factors influencing dataset popularity in UCI data repositories. Sensitivity degree as a weight was used to re-rank the datasets in order to predict high-popularity datasets. We examined whether the relationship between factors and popularity differs depending on the subject domain in the UCI. GitHub data repository was used for evaluating the applicability of the proposed framework to other types of factor analysis. Abstract: With their rapid development, data repositories usually provide abundant metadata—including data types, keywords, downloads, stars, forks, and citations—along with the data content. These rich metadata can be used as valuable resources to study the factors that facilitate data sharing. However, few previous studies have attempted to study which metadata are correlated with the popularity of data. This study overcomes these issues by extracting the major factors for each dataset from a well-known data repository, the UCI Machine Learning Repository, and a popular open-source software repository, GitHub. We trained a neural network model and measured the influence of these features on quantified popularity metrics using the weight product of connecting neurons. We grouped the UCI factors into two categories (intrinsic and extrinsic) and the GitHub factors into three categories (intrinsic, extrinsic, and web-related) to analyze their influence on popularity at each level. The quantifiedHighlights: This paper used a neural network based method to analyze factors influencing dataset popularity in UCI data repositories. Sensitivity degree as a weight was used to re-rank the datasets in order to predict high-popularity datasets. We examined whether the relationship between factors and popularity differs depending on the subject domain in the UCI. GitHub data repository was used for evaluating the applicability of the proposed framework to other types of factor analysis. Abstract: With their rapid development, data repositories usually provide abundant metadata—including data types, keywords, downloads, stars, forks, and citations—along with the data content. These rich metadata can be used as valuable resources to study the factors that facilitate data sharing. However, few previous studies have attempted to study which metadata are correlated with the popularity of data. This study overcomes these issues by extracting the major factors for each dataset from a well-known data repository, the UCI Machine Learning Repository, and a popular open-source software repository, GitHub. We trained a neural network model and measured the influence of these features on quantified popularity metrics using the weight product of connecting neurons. We grouped the UCI factors into two categories (intrinsic and extrinsic) and the GitHub factors into three categories (intrinsic, extrinsic, and web-related) to analyze their influence on popularity at each level. The quantified influence was used to predict the popularity of the data or software. We conducted a statistical analysis to explore the relationship between these factors and popularity with five different domains (life sciences, physical sciences, computer science/engineering, social sciences, and others) for the UCI repository. This study's findings contribute to understanding the factors that affect the popularity of open datasets or software for providing guidance on data sharing, reuse, and organization. … (more)
- Is Part Of:
- Journal of informetrics. Volume 15:Issue 3(2021)
- Journal:
- Journal of informetrics
- Issue:
- Volume 15:Issue 3(2021)
- Issue Display:
- Volume 15, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 15
- Issue:
- 3
- Issue Sort Value:
- 2021-0015-0003-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-08
- Subjects:
- Data repository -- Sensitivity analysis -- Neural network -- UCI repository -- GitHub
Library statistics -- Periodicals
Information science -- Statistical methods -- Periodicals
Bibliometrics -- Periodicals
Bibliothèques -- Statistiques -- Périodiques
Sciences de l'information -- Méthodes statistiques -- Périodiques
Bibliométrie -- Périodiques
020.727 - Journal URLs:
- http://www.journals.elsevier.com/journal-of-informetrics/ ↗
http://rave.ohiolink.edu/ejournals/issn/17511577/ ↗
http://www.sciencedirect.com/science/journal/17511577 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.joi.2021.101142 ↗
- Languages:
- English
- ISSNs:
- 1751-1577
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5006.830000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19339.xml