Exploring and modelling team performances of the Kaggle European Soccer database. (February 2019)
- Record Type:
- Journal Article
- Title:
- Exploring and modelling team performances of the Kaggle European Soccer database. (February 2019)
- Main Title:
- Exploring and modelling team performances of the Kaggle European Soccer database
- Authors:
- Carpita, Maurizio
Ciavolino, Enrico
Pasca, Paola - Other Names:
- Groll Andreas guest-editor.
Manisera Marica guest-editor.
Schauberger Gunther guest-editor.
Zuccolotto Paola guest-editor. - Abstract:
- This study explores a big and open database of soccer leagues in 10 European countries. Data related to players, teams and matches covering seven seasons (from 2009/2010 to 2015/2016) were retrieved from Kaggle, an online platform in which big data are available for predictive modelling and analytics competition among data scientists. Based on both preliminary data analysis, experts' evaluation and players' position on the football pitch, role-based indicators of teams' performance have been built and used to estimate the win probability of the home team with the binomial logistic regression (BLR) model that has been extended including the ELO rating predictor and two random effects due to the hierarchical structure of the dataset. The predictive power of the BLR model and its extensions has been compared with the one of other statistical modelling approaches (Random Forest, Neural Network, k-NN, Naïve Bayes). Results showed that role-based indicators substantially improved the performance of all the models used in both this work and in previous works available on Kaggle. The base BLR model increased prediction accuracy by 10 percentage points, and showed the importance of defence performances, especially in the last seasons. Inclusion of both ELO rating predictor and the random effects did not substantially improve prediction, as the simpler BLR model performed equally good. With respect to the other models, only Naïve Bayes showed more balanced results in predicting bothThis study explores a big and open database of soccer leagues in 10 European countries. Data related to players, teams and matches covering seven seasons (from 2009/2010 to 2015/2016) were retrieved from Kaggle, an online platform in which big data are available for predictive modelling and analytics competition among data scientists. Based on both preliminary data analysis, experts' evaluation and players' position on the football pitch, role-based indicators of teams' performance have been built and used to estimate the win probability of the home team with the binomial logistic regression (BLR) model that has been extended including the ELO rating predictor and two random effects due to the hierarchical structure of the dataset. The predictive power of the BLR model and its extensions has been compared with the one of other statistical modelling approaches (Random Forest, Neural Network, k-NN, Naïve Bayes). Results showed that role-based indicators substantially improved the performance of all the models used in both this work and in previous works available on Kaggle. The base BLR model increased prediction accuracy by 10 percentage points, and showed the importance of defence performances, especially in the last seasons. Inclusion of both ELO rating predictor and the random effects did not substantially improve prediction, as the simpler BLR model performed equally good. With respect to the other models, only Naïve Bayes showed more balanced results in predicting both win and no-win of the home team. … (more)
- Is Part Of:
- Statistical modelling. Volume 19:Number 1(2019)
- Journal:
- Statistical modelling
- Issue:
- Volume 19:Number 1(2019)
- Issue Display:
- Volume 19, Issue 1 (2019)
- Year:
- 2019
- Volume:
- 19
- Issue:
- 1
- Issue Sort Value:
- 2019-0019-0001-0000
- Page Start:
- 74
- Page End:
- 101
- Publication Date:
- 2019-02
- Subjects:
- Kaggle European Soccer (KES) database -- binomial logistic regression (BLR) model -- role-based player performance indicators -- prediction of match results -- comparison of classification models -- statistical learning models
Linear models (Statistics) -- Periodicals
Mathematical models -- Periodicals
Modèles linéaires (Statistique) -- Périodiques
Modèles mathématiques -- Périodiques
Modèle statistique
Modèle linéaire
Modélisation statistique
Périodique électronique (Descripteur de forme)
Ressource Internet (Descripteur de forme)
519.5011 - Journal URLs:
- http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=1471-082x;screen=info;ECOIP ↗ - DOI:
- 10.1177/1471082X18810971 ↗
- Languages:
- English
- ISSNs:
- 1471-082X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9771.xml