Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling. (May 2019)
- Record Type:
- Journal Article
- Title:
- Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling. (May 2019)
- Main Title:
- Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling
- Authors:
- Sinha, Parmanand
Gaughan, Andrea E.
Stevens, Forrest R.
Nieves, Jeremiah J.
Sorichetta, Alessandro
Tatem, Andrew J. - Abstract:
- Abstract : Abstract: Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques. In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling. Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in theAbstract : Abstract: Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques. In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling. Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in the variance of the residuals. Sample training datasets that are at a spatial scale representative of the true population produced the best fitting models. However, the true representative dataset varied in autocorrelation for both scales. More attention is needed with ensemble-based learning and spatially-heterogeneous data as underlying issues of spatial autocorrelation influence results for both the census-level and grid-level estimations. Highlights: Random forest is sensitive to spatial autocorrelation and spatial representation of the true population is required for the best fitting models. Gridded population outputs are trained at a coarser level than the one for which they are created such as the pixel or grid cell. The range of population density of the target data differs from the source, which can lead to underestimation of dispersion and also extremes in the distribution. We examined the effect of mismatch related to range, variability, and spatial structure in a spatially downscaling of population distribution. More research is needed in spatial ensemble learning approaches aimed to be used with spatial data with high autocorrelation and heterogeneity. … (more)
- Is Part Of:
- Computers, environment and urban systems. Volume 75(2019)
- Journal:
- Computers, environment and urban systems
- Issue:
- Volume 75(2019)
- Issue Display:
- Volume 75, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 75
- Issue:
- 2019
- Issue Sort Value:
- 2019-0075-2019-0000
- Page Start:
- 132
- Page End:
- 145
- Publication Date:
- 2019-05
- Subjects:
- Dasymetric modeling -- Random forest -- Spatial autocorrelation -- Gridded population modeling
City planning -- Data processing -- Periodicals
Regional planning -- Data processing -- Periodicals
303.4834 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01989715 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compenvurbsys.2019.01.006 ↗
- Languages:
- English
- ISSNs:
- 0198-9715
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.914000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9641.xml