Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. (March 2018)
- Record Type:
- Journal Article
- Title:
- Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. (March 2018)
- Main Title:
- Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation
- Authors:
- Meyer, Hanna
Reudenbach, Christoph
Hengl, Tomislav
Katurji, Marwan
Nauss, Thomas - Abstract:
- Abstract: Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature ( T a i r ) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k -fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k -fold ( R 2 = 0.9 for T a i r and 0.92 for VW) and target-oriented CV (LLO R 2 = 0.24 for T a i r and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k -fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R 2 = 0.47 for T a i r and 0.55 for VW). Highlights: Random k-fold cross validation (CV) leads to an overoptimistic view on model results. Target-oriented CV is required for reliable error estimates of space-time models. Temporally static variables can lead to spatial over-fitting. Feature Selection in conjunction with target-oriented CV can reduce over-fitting. The proposed modelling framework should becomeAbstract: Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature ( T a i r ) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k -fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k -fold ( R 2 = 0.9 for T a i r and 0.92 for VW) and target-oriented CV (LLO R 2 = 0.24 for T a i r and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k -fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R 2 = 0.47 for T a i r and 0.55 for VW). Highlights: Random k-fold cross validation (CV) leads to an overoptimistic view on model results. Target-oriented CV is required for reliable error estimates of space-time models. Temporally static variables can lead to spatial over-fitting. Feature Selection in conjunction with target-oriented CV can reduce over-fitting. The proposed modelling framework should become common practice. … (more)
- Is Part Of:
- Environmental modelling & software. Volume 101(2018)
- Journal:
- Environmental modelling & software
- Issue:
- Volume 101(2018)
- Issue Display:
- Volume 101, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 101
- Issue:
- 2018
- Issue Sort Value:
- 2018-0101-2018-0000
- Page Start:
- 1
- Page End:
- 9
- Publication Date:
- 2018-03
- Subjects:
- Cross-validation -- Feature selection -- Over-fitting -- Random forest -- Spatio-temporal -- Target-oriented validation
Environmental monitoring -- Computer programs -- Periodicals
Ecology -- Computer simulation -- Periodicals
Digital computer simulation -- Periodicals
Computer software -- Periodicals
Environmental Monitoring -- Periodicals
Computer Simulation -- Periodicals
Environnement -- Surveillance -- Logiciels -- Périodiques
Écologie -- Simulation, Méthodes de -- Périodiques
Simulation par ordinateur -- Périodiques
Logiciels -- Périodiques
Computer software
Digital computer simulation
Ecology -- Computer simulation
Environmental monitoring -- Computer programs
Periodicals
Electronic journals
363.70015118 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13648152 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.envsoft.2017.12.001 ↗
- Languages:
- English
- ISSNs:
- 1364-8152
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3791.522800
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11564.xml