Global forensic geolocation with deep neural networks. Issue 4 (23rd June 2020)
- Record Type:
- Journal Article
- Title:
- Global forensic geolocation with deep neural networks. Issue 4 (23rd June 2020)
- Main Title:
- Global forensic geolocation with deep neural networks
- Authors:
- Grantham, Neal S.
Reich, Brian J.
Laber, Eric B.
Pacifici, Krishna
Dunn, Robert R.
Fierer, Noah
Gebert, Matthew
Allwood, Julia S.
Faith, Seth A. - Abstract:
- Summary: An important problem in modern forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, which is known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application specific, labour intensive and often subjective. Purely data‐driven methods have yet to be fully realized in this domain, because in part of the lack of a sufficiently rich source of data. However, high throughput sequencing technologies can identify tens of thousands of fungi and bacteria taxa by using DNA recovered from a single swab collected from nearly any object or surface. This microbial community, or microbiome, may be highly informative of the provenance of the sample, but data on the spatial variation of microbiomes are sparse and high dimensional and have a complex dependence structure that render them difficult to model with standard statistical tools. Deep learning algorithms have generated a tremendous amount of interest within the machine learning community for their predictive performance in high dimensional problems. We present DeepSpace: a new algorithm for geolocation that aggregates over an ensemble of deep neural network classifiers trained on randomly generated Voronoi partitions of a spatial domain. The DeepSpace algorithm makes remarkably good point predictions; for example, when applied to the microbiomes of over 1300 dustSummary: An important problem in modern forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, which is known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application specific, labour intensive and often subjective. Purely data‐driven methods have yet to be fully realized in this domain, because in part of the lack of a sufficiently rich source of data. However, high throughput sequencing technologies can identify tens of thousands of fungi and bacteria taxa by using DNA recovered from a single swab collected from nearly any object or surface. This microbial community, or microbiome, may be highly informative of the provenance of the sample, but data on the spatial variation of microbiomes are sparse and high dimensional and have a complex dependence structure that render them difficult to model with standard statistical tools. Deep learning algorithms have generated a tremendous amount of interest within the machine learning community for their predictive performance in high dimensional problems. We present DeepSpace: a new algorithm for geolocation that aggregates over an ensemble of deep neural network classifiers trained on randomly generated Voronoi partitions of a spatial domain. The DeepSpace algorithm makes remarkably good point predictions; for example, when applied to the microbiomes of over 1300 dust samples collected across continental USA, more than half of geolocation predictions produced by this model fall less than 100 km from their true origin, which is a 60% reduction in error from competing geolocation methods. Moreover, we apply DeepSpace to a novel data set of global dust samples collected from nearly 30 countries, finding that dust‐associated fungi alone predict a sample's country of origin with nearly 90% accuracy. … (more)
- Is Part Of:
- Journal of the Royal Statistical Society. Volume 69:Issue 4(2020)
- Journal:
- Journal of the Royal Statistical Society
- Issue:
- Volume 69:Issue 4(2020)
- Issue Display:
- Volume 69, Issue 4 (2020)
- Year:
- 2020
- Volume:
- 69
- Issue:
- 4
- Issue Sort Value:
- 2020-0069-0004-0000
- Page Start:
- 909
- Page End:
- 929
- Publication Date:
- 2020-06-23
- Subjects:
- Citizen science -- Machine learning -- Microbiome -- Non‐homogeneous Poisson process -- Spatial point pattern
Statistics -- Periodicals
519.5 - Journal URLs:
- http://rss.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)1467-9876/ ↗
https://academic.oup.com/jrsssc ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/rssc.12427 ↗
- Languages:
- English
- ISSNs:
- 0035-9254
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1580.000000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13762.xml