On estimating probability of presence from use–availability or presence–background data. Issue 6 (1st June 2013)
- Record Type:
- Journal Article
- Title:
- On estimating probability of presence from use–availability or presence–background data. Issue 6 (1st June 2013)
- Main Title:
- On estimating probability of presence from use–availability or presence–background data
- Authors:
- Phillips, Steven J.
Elith, Jane - Abstract:
- Abstract : A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence–background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S‐shaped) and ran five logistic model‐fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under‐ or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, whereAbstract : A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence–background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S‐shaped) and ran five logistic model‐fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under‐ or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence–background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence. … (more)
- Is Part Of:
- Ecology. Volume 94:Issue 6(2013)
- Journal:
- Ecology
- Issue:
- Volume 94:Issue 6(2013)
- Issue Display:
- Volume 94, Issue 6 (2013)
- Year:
- 2013
- Volume:
- 94
- Issue:
- 6
- Issue Sort Value:
- 2013-0094-0006-0000
- Page Start:
- 1409
- Page End:
- 1419
- Publication Date:
- 2013-06-01
- Subjects:
- availability -- background -- identifiability -- logistic -- measuring use vs. non-use -- presence–background -- prevalence -- resource selection -- species distribution model
Ecology -- Periodicals
Ecology -- Periodicals
Écologie -- Périodiques
Ecologie
Écologie
Écologie animale
Écologie végétale
Ecology
Periodicals
577.05 - Journal URLs:
- http://www.jstor.org/journals/00129658.html ↗
http://www.esajournals.org/perlserv/?request=get-archive&issn=0012-9658 ↗
http://esajournals.onlinelibrary.wiley.com/hub/journal/10.1002/(ISSN)1939-9170/ ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1890/12-1520.1 ↗
- Languages:
- English
- ISSNs:
- 0012-9658
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3650.000000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1285.xml