Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. (February 2019)
- Record Type:
- Journal Article
- Title:
- Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. (February 2019)
- Main Title:
- Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city
- Authors:
- Yuchi, Weiran
Gombojav, Enkhjargal
Boldbaatar, Buyantushig
Galsuren, Jargalsaikhan
Enkhmaa, Sarangerel
Beejin, Bolor
Naidan, Gerel
Ochir, Chimedsuren
Legtseg, Bayarkhuu
Byambaa, Tsogtbaatar
Barn, Prabjit
Henderson, Sarah B.
Janes, Craig R.
Lanphear, Bruce P.
McCandless, Lawrence C.
Takaro, Tim K.
Venners, Scott A.
Webster, Glenys M.
Allen, Ryan W. - Abstract:
- Abstract: Background: Indoor and outdoor fine particulate matter (PM2.5 ) are both leading risk factors for death and disease, but making indoor measurements is often infeasible for large study populations. Methods: We developed models to predict indoor PM2.5 concentrations for pregnant women who were part of a randomized controlled trial of portable air cleaners in Ulaanbaatar, Mongolia. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM2.5 concentrations with 447 independent 7-day PM2.5 measurements and 87 potential predictor variables obtained from outdoor monitoring data, questionnaires, home assessments, and geographic data sets. We also developed blended models that combined the MLR and RFR approaches. All models were evaluated in a 10-fold cross-validation. Results: The predictors in the MLR model were season, outdoor PM2.5 concentration, the number of air cleaners deployed, and the density of gers (traditional felt-lined yurts) surrounding the apartments. MLR and RFR had similar performance in cross-validation (R 2 = 50.2%, R 2 = 48.9% respectively). The blended MLR model that included RFR predictions had the best performance (cross validation R 2 = 81.5%). Intervention status alone explained only 6.0% of the variation in indoor PM2.5 concentrations. Conclusions: We predicted a moderate amount of variation in indoor PM2.5 concentrations using easily obtained predictor variables and the models explained substantially moreAbstract: Background: Indoor and outdoor fine particulate matter (PM2.5 ) are both leading risk factors for death and disease, but making indoor measurements is often infeasible for large study populations. Methods: We developed models to predict indoor PM2.5 concentrations for pregnant women who were part of a randomized controlled trial of portable air cleaners in Ulaanbaatar, Mongolia. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM2.5 concentrations with 447 independent 7-day PM2.5 measurements and 87 potential predictor variables obtained from outdoor monitoring data, questionnaires, home assessments, and geographic data sets. We also developed blended models that combined the MLR and RFR approaches. All models were evaluated in a 10-fold cross-validation. Results: The predictors in the MLR model were season, outdoor PM2.5 concentration, the number of air cleaners deployed, and the density of gers (traditional felt-lined yurts) surrounding the apartments. MLR and RFR had similar performance in cross-validation (R 2 = 50.2%, R 2 = 48.9% respectively). The blended MLR model that included RFR predictions had the best performance (cross validation R 2 = 81.5%). Intervention status alone explained only 6.0% of the variation in indoor PM2.5 concentrations. Conclusions: We predicted a moderate amount of variation in indoor PM2.5 concentrations using easily obtained predictor variables and the models explained substantially more variation than intervention status alone. While RFR shows promise for modelling indoor concentrations, our results highlight the importance of out-of-sample validation when evaluating model performance. We also demonstrate the improved performance of blended MLR/RFR models in predicting indoor air pollution. Graphical abstract: Highlights: Indoor air pollution is an important determinant of personal exposure. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM2.5 concentrations. Blended models combining MLR and RFR approaches outperformed stand-alone models. RFR and blended MLR/RFR approaches show promise for modelling indoor pollution. Abstract : To our knowledge, this is the first application and evaluation of random forest regression and blended models for modelling indoor air pollution concentrations, and these techniques show promise for estimating exposure in health risk assessment and epidemiology. … (more)
- Is Part Of:
- Environmental pollution. Volume 245(2019)
- Journal:
- Environmental pollution
- Issue:
- Volume 245(2019)
- Issue Display:
- Volume 245, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 245
- Issue:
- 2019
- Issue Sort Value:
- 2019-0245-2019-0000
- Page Start:
- 746
- Page End:
- 753
- Publication Date:
- 2019-02
- Subjects:
- Pollution -- Periodicals
Pollution -- Environmental aspects -- Periodicals
Environmental Pollution -- Periodicals
Pollution -- Périodiques
Pollution -- Aspect de l'environnement -- Périodiques
Pollution -- Effets physiologiques -- Périodiques
Pollution
Pollution -- Environmental aspects
Periodicals
Electronic journals
363.73 - Journal URLs:
- http://www.sciencedirect.com/science/journal/02697491 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.envpol.2018.11.034 ↗
- Languages:
- English
- ISSNs:
- 0269-7491
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3791.539000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9423.xml