A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide. (September 2019)
- Record Type:
- Journal Article
- Title:
- A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide. (September 2019)
- Main Title:
- A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide
- Authors:
- Chen, Jie
de Hoogh, Kees
Gulliver, John
Hoffmann, Barbara
Hertel, Ole
Ketzel, Matthias
Bauwelinck, Mariska
van Donkelaar, Aaron
Hvidtfeldt, Ulla A.
Katsouyanni, Klea
Janssen, Nicole A.H.
Martin, Randall V.
Samoli, Evangelia
Schwartz, Per E.
Stafoggia, Massimo
Bellander, Tom
Strak, Maciek
Wolf, Kathrin
Vienneau, Danielle
Vermeulen, Roel
Brunekreef, Bert
Hoek, Gerard - Abstract:
- Abstract: Empirical spatial air pollution models have been applied extensively to assess exposure in epidemiological studies with increasingly sophisticated and complex statistical algorithms beyond ordinary linear regression. However, different algorithms have rarely been compared in terms of their predictive ability. This study compared 16 algorithms to predict annual average fine particle (PM2.5 ) and nitrogen dioxide (NO2 ) concentrations across Europe. The evaluated algorithms included linear stepwise regression, regularization techniques and machine learning methods. Air pollution models were developed based on the 2010 routine monitoring data from the AIRBASE dataset maintained by the European Environmental Agency (543 sites for PM2.5 and 2399 sites for NO2 ), using satellite observations, dispersion model estimates and land use variables as predictors. We compared the models by performing five-fold cross-validation (CV) and by external validation (EV) using annual average concentrations measured at 416 (PM2.5 ) and 1396 sites (NO2 ) from the ESCAPE study. We further assessed the correlations between predictions by each pair of algorithms at the ESCAPE sites. For PM2.5, the models performed similarly across algorithms with a mean CV R 2 of 0.59 and a mean EV R 2 of 0.53. Generalized boosted machine, random forest and bagging performed best (CV R 2 ~0.63; EV R 2 0.58–0.61), while backward stepwise linear regression, support vector regression and artificial neuralAbstract: Empirical spatial air pollution models have been applied extensively to assess exposure in epidemiological studies with increasingly sophisticated and complex statistical algorithms beyond ordinary linear regression. However, different algorithms have rarely been compared in terms of their predictive ability. This study compared 16 algorithms to predict annual average fine particle (PM2.5 ) and nitrogen dioxide (NO2 ) concentrations across Europe. The evaluated algorithms included linear stepwise regression, regularization techniques and machine learning methods. Air pollution models were developed based on the 2010 routine monitoring data from the AIRBASE dataset maintained by the European Environmental Agency (543 sites for PM2.5 and 2399 sites for NO2 ), using satellite observations, dispersion model estimates and land use variables as predictors. We compared the models by performing five-fold cross-validation (CV) and by external validation (EV) using annual average concentrations measured at 416 (PM2.5 ) and 1396 sites (NO2 ) from the ESCAPE study. We further assessed the correlations between predictions by each pair of algorithms at the ESCAPE sites. For PM2.5, the models performed similarly across algorithms with a mean CV R 2 of 0.59 and a mean EV R 2 of 0.53. Generalized boosted machine, random forest and bagging performed best (CV R 2 ~0.63; EV R 2 0.58–0.61), while backward stepwise linear regression, support vector regression and artificial neural network performed less well (CV R 2 0.48–0.57; EV R 2 0.39–0.46). Most of the PM2.5 model predictions at ESCAPE sites were highly correlated (R 2 > 0.85, with the exception of predictions from the artificial neural network). For NO2, the models performed even more similarly across different algorithms, with CV R 2 s ranging from 0.57 to 0.62, and EV R 2 s ranging from 0.49 to 0.51. The predicted concentrations from all algorithms at ESCAPE sites were highly correlated (R 2 > 0.9). For both pollutants, biases were low for all models except the artificial neural network. Dispersion model estimates and satellite observations were two of the most important predictors for PM2.5 models whilst dispersion model estimates and traffic variables were most important for NO2 models in all algorithms that allow assessment of the importance of variables. Different statistical algorithms performed similarly when modelling spatial variation in annual average air pollution concentrations using a large number of training sites. Highlights: Multiple statistical algorithms with very different assumptions were compared. Despite the difference in modeling frameworks, predictions among the models exhibit generally good agreement. The use of an external evaluation dataset strengthens evaluation by cross-validation. … (more)
- Is Part Of:
- Environment international. Volume 130(2019)
- Journal:
- Environment international
- Issue:
- Volume 130(2019)
- Issue Display:
- Volume 130, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 130
- Issue:
- 2019
- Issue Sort Value:
- 2019-0130-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-09
- Subjects:
- Land use regression -- Fine particles -- Nitrogen dioxide -- Machine learning
Environmental protection -- Periodicals
Environmental health -- Periodicals
Environmental monitoring -- Periodicals
Environmental Monitoring -- Periodicals
Environnement -- Protection -- Périodiques
Hygiène du milieu -- Périodiques
Environnement -- Surveillance -- Périodiques
Environmental health
Environmental monitoring
Environmental protection
Periodicals
333.705 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01604120 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.envint.2019.104934 ↗
- Languages:
- English
- ISSNs:
- 0160-4120
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3791.330000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16294.xml