Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. (1st November 2020)
- Record Type:
- Journal Article
- Title:
- Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London. (1st November 2020)
- Main Title:
- Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London
- Authors:
- Analitis, Antonis
Barratt, Benjamin
Green, David
Beddows, Andrew
Samoli, Evangelia
Schwartz, Joel
Katsouyanni, Klea - Abstract:
- Abstract: The adverse health effects of air pollutants, especially those of PM2.5, are well documented. However, a lack of adequate monitoring and weaknesses in modelling approaches do not allow a good assessment of health effects in many areas of the World. Advances in computational methods and the availability of new data sets, e.g. satellite remote observations, have enlarged the possibilities of modelling for application in large scale health effects studies. However, PM2.5 monitoring is very recent in most of the World and more limited compared to other pollutants, and understanding how to use PM10 monitors to estimate PM2.5 exposure is therefore important. Since interest in these methods is relatively recent, there is a need for testing their performance against ambient measurements, but long term PM2.5 datasets are less readily available than PM10 in many regions. In the present study we report the methodology and results of using regression modelling and a machine learning method (Random Forest-RF), as well as a combination of the two, to enhance a PM2.5 measurement data base in London using PM10 and NOx measurements as well as other predictors and compare the relative performance of each method. We found that the combination of predictions by the regression model and the RF performs best and we obtain a cross-validation R 2 of 99.29% and 98.22% for the 5-year periods 2004–2008 and 2009–2013, respectively, and a Mean Square Error near 1. Our enhanced data base forAbstract: The adverse health effects of air pollutants, especially those of PM2.5, are well documented. However, a lack of adequate monitoring and weaknesses in modelling approaches do not allow a good assessment of health effects in many areas of the World. Advances in computational methods and the availability of new data sets, e.g. satellite remote observations, have enlarged the possibilities of modelling for application in large scale health effects studies. However, PM2.5 monitoring is very recent in most of the World and more limited compared to other pollutants, and understanding how to use PM10 monitors to estimate PM2.5 exposure is therefore important. Since interest in these methods is relatively recent, there is a need for testing their performance against ambient measurements, but long term PM2.5 datasets are less readily available than PM10 in many regions. In the present study we report the methodology and results of using regression modelling and a machine learning method (Random Forest-RF), as well as a combination of the two, to enhance a PM2.5 measurement data base in London using PM10 and NOx measurements as well as other predictors and compare the relative performance of each method. We found that the combination of predictions by the regression model and the RF performs best and we obtain a cross-validation R 2 of 99.29% and 98.22% for the 5-year periods 2004–2008 and 2009–2013, respectively, and a Mean Square Error near 1. Our enhanced data base for PM2.5 is available for use by other researchers. Highlights: A methodology based on regression and machine-learning models was developed to enhance the availability of a PM2.5 measurements data base. London dense monitoring network was used to predict PM2.5 concentrations, using other pollutants, meteorological and land-use variables. The combination of regression and machine learning methods leads to improved predictions throughout the range of PM2.5 concentrations. Given the scarcity of PM2.5 measurements across the world, this methodology could aid the verification of exposure models. The developed PM2.5 data base is available for use in health studies. … (more)
- Is Part Of:
- Atmospheric environment. Volume 240(2020)
- Journal:
- Atmospheric environment
- Issue:
- Volume 240(2020)
- Issue Display:
- Volume 240, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 240
- Issue:
- 2020
- Issue Sort Value:
- 2020-0240-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11-01
- Subjects:
- PM2.5 prediction -- Environmental exposure -- Random forest -- Ensemble methods -- London case study
Air -- Pollution -- Periodicals
Air -- Pollution -- Meteorological aspects -- Periodicals
551.51 - Journal URLs:
- http://www.sciencedirect.com/web-editions/journal/13522310 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.atmosenv.2020.117757 ↗
- Languages:
- English
- ISSNs:
- 1352-2310
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1767.120000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13981.xml