Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Issue 11 (16th October 2013)
- Record Type:
- Journal Article
- Title:
- Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Issue 11 (16th October 2013)
- Main Title:
- Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species
- Authors:
- Chao, Anne
Wang, Y. T.
Jost, Lou - Editors:
- Warton, David
- Abstract:
- Summary: Estimating Shannon entropy and its exponential from incomplete samples is a central objective of many research fields. However, empirical estimates of Shannon entropy and its exponential depend strongly on sample size and typically exhibit substantial bias. This work uses a novel method to obtain an accurate, low‐bias analytic estimator of entropy, based on species frequency counts. Our estimator does not require prior knowledge of the number of species. We show that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size. We reformulate entropy in terms of the expected discovery rates of new species with respect to sample size, that is, the successive slopes of the species accumulation curve. Our estimator is obtained by applying slope estimators derived from an improved Good‐Turing frequency formula. Our method is also applied to estimate mutual information. Extensive simulations from theoretical models and real surveys show that if sample size is not unreasonably small, the resulting entropy estimator is nearly unbiased. Our estimator generally outperforms previous methods in terms of bias and accuracy (low mean squared error) especially when species richness is large and there is a large fraction of undetected species in samples. We discuss the extension of our approach to estimate Shannon entropy for multiple incidence data. The use of ourSummary: Estimating Shannon entropy and its exponential from incomplete samples is a central objective of many research fields. However, empirical estimates of Shannon entropy and its exponential depend strongly on sample size and typically exhibit substantial bias. This work uses a novel method to obtain an accurate, low‐bias analytic estimator of entropy, based on species frequency counts. Our estimator does not require prior knowledge of the number of species. We show that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size. We reformulate entropy in terms of the expected discovery rates of new species with respect to sample size, that is, the successive slopes of the species accumulation curve. Our estimator is obtained by applying slope estimators derived from an improved Good‐Turing frequency formula. Our method is also applied to estimate mutual information. Extensive simulations from theoretical models and real surveys show that if sample size is not unreasonably small, the resulting entropy estimator is nearly unbiased. Our estimator generally outperforms previous methods in terms of bias and accuracy (low mean squared error) especially when species richness is large and there is a large fraction of undetected species in samples. We discuss the extension of our approach to estimate Shannon entropy for multiple incidence data. The use of our estimator in constructing an integrated rarefaction and extrapolation curve of entropy (or mutual information) as a function of sample size or sample coverage (an aspect of sample completeness) is also discussed. … (more)
- Is Part Of:
- Methods in ecology and evolution. Volume 4:Issue 11(2013:Nov.)
- Journal:
- Methods in ecology and evolution
- Issue:
- Volume 4:Issue 11(2013:Nov.)
- Issue Display:
- Volume 4, Issue 11 (2013)
- Year:
- 2013
- Volume:
- 4
- Issue:
- 11
- Issue Sort Value:
- 2013-0004-0011-0000
- Page Start:
- 1091
- Page End:
- 1100
- Publication Date:
- 2013-10-16
- Subjects:
- diversity -- Good‐Turing frequency formula -- mutual information -- sample coverage -- Shannon entropy -- species accumulation curve -- species discovery rate
Ecology -- Periodicals
Evolution -- Periodicals
577 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)2041-210X ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/2041-210X.12108 ↗
- Languages:
- English
- ISSNs:
- 2041-210X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 937.xml