Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance. Issue 3 (9th January 2020)
- Record Type:
- Journal Article
- Title:
- Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance. Issue 3 (9th January 2020)
- Main Title:
- Deciphering ecology from statistical artefacts: Competing influence of sample size, prevalence and habitat specialization on species distribution models and how small evaluation datasets can inflate metrics of performance
- Authors:
- Hallman, Tyler A.
Robinson, William D. - Editors:
- Real, Raimundo
- Abstract:
- Abstract: Aim: Sample size and species characteristics, including prevalence and habitat specialization, can influence the predictive performance of species distribution models (SDMs). There is little agreement, however, on which metric of model performance to use. Here, we directly compare AUC and partial ROC as metrics of SDM performance through analyses on the effects of species traits and sample size on SDM performance. Location: Three counties dominated by agricultural lands and coniferous forest in Oregon's Willamette Valley and Coast Range ecoregions. Methods: We systematically reduced a large avian point count dataset to alter sample sizes of 22 species of songbird. We used boosted regression trees to run SDMs for each species, quantified habitat specialization, and used mixed effects models to compare the influence of sample size, prevalence, and habitat specialization on SDM performance, calculated as AUC and partial ROC, across species. We calculated AUC and partial ROC with subset and independent evaluation data separately to more comprehensively investigate differences in metrics. Results: We found a positive quadratic effect of sample size and a strongly positive effect of habitat specialization on both metrics of model performance. We found a weak effect of prevalence on partial ROC and no effect in AUC. Contrary to expectations, when evaluated with a subset evaluation data, partial ROC was consistently highest in models with the smallest sample sizes. TheseAbstract: Aim: Sample size and species characteristics, including prevalence and habitat specialization, can influence the predictive performance of species distribution models (SDMs). There is little agreement, however, on which metric of model performance to use. Here, we directly compare AUC and partial ROC as metrics of SDM performance through analyses on the effects of species traits and sample size on SDM performance. Location: Three counties dominated by agricultural lands and coniferous forest in Oregon's Willamette Valley and Coast Range ecoregions. Methods: We systematically reduced a large avian point count dataset to alter sample sizes of 22 species of songbird. We used boosted regression trees to run SDMs for each species, quantified habitat specialization, and used mixed effects models to compare the influence of sample size, prevalence, and habitat specialization on SDM performance, calculated as AUC and partial ROC, across species. We calculated AUC and partial ROC with subset and independent evaluation data separately to more comprehensively investigate differences in metrics. Results: We found a positive quadratic effect of sample size and a strongly positive effect of habitat specialization on both metrics of model performance. We found a weak effect of prevalence on partial ROC and no effect in AUC. Contrary to expectations, when evaluated with a subset evaluation data, partial ROC was consistently highest in models with the smallest sample sizes. These small sample size models had correspondingly small sample sizes in subset evaluation datasets. Partial ROC evaluated with independent data and AUC evaluated with subset or independent data showed the expected positive correlation between sample size and model performance. Main Conclusions: We found that small evaluation datasets can artificially inflate partial ROC. With literature recommended minimum SDM sample sizes as low as three, attention must be given to the effects of correspondingly low sample sizes in evaluation datasets. … (more)
- Is Part Of:
- Diversity & distributions. Volume 26:Issue 3(2020)
- Journal:
- Diversity & distributions
- Issue:
- Volume 26:Issue 3(2020)
- Issue Display:
- Volume 26, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 26
- Issue:
- 3
- Issue Sort Value:
- 2020-0026-0003-0000
- Page Start:
- 315
- Page End:
- 328
- Publication Date:
- 2020-01-09
- Subjects:
- AUC -- ecological niche model -- habitat specialization -- partial ROC -- prevalence -- sample size -- species distribution model
Biodiversity -- Periodicals
Biodiversity conservation -- Periodicals
577 - Journal URLs:
- http://www.blackwell-synergy.com/member/institutions/issuelist.asp?journal=ddi ↗
http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1472-4642 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/ddi.13030 ↗
- Languages:
- English
- ISSNs:
- 1366-9516
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3604.271107
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12792.xml