Detecting outliers in species distribution data. (15th November 2017)
- Record Type:
- Journal Article
- Title:
- Detecting outliers in species distribution data. (15th November 2017)
- Main Title:
- Detecting outliers in species distribution data
- Authors:
- Liu, Canran
White, Matt
Newell, Graeme - Abstract:
- Abstract: Aim: Species distribution data play a pivotal role in the study of ecology, evolution, biogeography and biodiversity conservation. Although large amounts of location data are available and accessible from public databases, data quality remains problematic. Of the potential sources of error, positional errors are critical for spatial applications, particularly where these errors place observations beyond the environmental or geographical range of species. These outliers need to be identified, checked and removed to improve data quality and minimize the impact on subsequent analyses. Manually checking all species records within large multispecies datasets is prohibitively costly. This work investigates algorithms that may assist in the efficient vetting of outliers in such large datasets. Location: We used real, spatially explicit environmental data derived from the western part of Victoria, Australia, and simulated species distributions within this same region. Methods: By adapting species distribution modelling (SDM), we developed a pseudo‐SDM approach for detecting outliers in species distribution data, which was implemented with random forest (RF) and support vector machine (SVM) resulting in two new methods: RF_pdSDM and SVM_pdSDM. Using virtual species, we compared eight existing multivariate outlier detection methods with these two new methods under various conditions. Results: The two new methods based on the pseudo‐SDM approach had higher true skillAbstract: Aim: Species distribution data play a pivotal role in the study of ecology, evolution, biogeography and biodiversity conservation. Although large amounts of location data are available and accessible from public databases, data quality remains problematic. Of the potential sources of error, positional errors are critical for spatial applications, particularly where these errors place observations beyond the environmental or geographical range of species. These outliers need to be identified, checked and removed to improve data quality and minimize the impact on subsequent analyses. Manually checking all species records within large multispecies datasets is prohibitively costly. This work investigates algorithms that may assist in the efficient vetting of outliers in such large datasets. Location: We used real, spatially explicit environmental data derived from the western part of Victoria, Australia, and simulated species distributions within this same region. Methods: By adapting species distribution modelling (SDM), we developed a pseudo‐SDM approach for detecting outliers in species distribution data, which was implemented with random forest (RF) and support vector machine (SVM) resulting in two new methods: RF_pdSDM and SVM_pdSDM. Using virtual species, we compared eight existing multivariate outlier detection methods with these two new methods under various conditions. Results: The two new methods based on the pseudo‐SDM approach had higher true skill statistic (TSS) values than other approaches, with TSS values always exceeding 0. More than 70% of the true outliers in datasets for species with a low and intermediate prevalence can be identified by checking 10% of the data points with the highest outlier scores. Main conclusions: Pseudo‐SDM‐based methods were more effective than other outlier detection methods. However, this outlier detection procedure can only be considered as a screening tool, and putative outliers must be examined by experts to determine whether they are actual errors or important records within an inherently biased set of data. … (more)
- Is Part Of:
- Journal of biogeography. Volume 45:Number 1(2018:Jan.)
- Journal:
- Journal of biogeography
- Issue:
- Volume 45:Number 1(2018:Jan.)
- Issue Display:
- Volume 45, Issue 1 (2018)
- Year:
- 2018
- Volume:
- 45
- Issue:
- 1
- Issue Sort Value:
- 2018-0045-0001-0000
- Page Start:
- 164
- Page End:
- 176
- Publication Date:
- 2017-11-15
- Subjects:
- outlier -- outlier detection -- random forest -- species distribution -- species distribution modelling -- support vector machine -- virtual species
Biogeography -- Periodicals
578.09 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1365-2699 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/jbi.13122 ↗
- Languages:
- English
- ISSNs:
- 0305-0270
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4952.900000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5610.xml