Fast, scalable, and automated identification of articles for biodiversity and macroecological datasets. Issue 1 (19th November 2020)

Record Type:: Journal Article
Title:: Fast, scalable, and automated identification of articles for biodiversity and macroecological datasets. Issue 1 (19th November 2020)
Main Title:: Fast, scalable, and automated identification of articles for biodiversity and macroecological datasets
Authors:: Cornford, Richard
Deinet, Stefanie
De Palma, Adriana
Hill, Samantha L. L.
McRae, Louise
Pettit, Benjamin
Marconi, Valentina
Purvis, Andy
Freeman, Robin
Editors:: Peres‐Neto, Pedro
Abstract:: Abstract: Aim: Understanding broad‐scale ecological patterns and processes is necessary if we are to mitigate the consequences of anthropogenically driven biodiversity degradation. However, such analyses require large datasets and current data collation methods can be slow, involving extensive human input. Given rapid and ever‐increasing rates of scientific publication, manually identifying data sources among hundreds of thousands of articles is a significant challenge, which can create a bottleneck in the generation of ecological databases. Innovation: Here, we demonstrate the use of general, text‐classification approaches to identify relevant biodiversity articles. We apply this to two freely available example databases, the Living Planet Database and the database of the PREDICTS (Projecting Responses of Ecological Diversity in Changing Terrestrial Systems) project, both of which underpin important biodiversity indicators. We assess machine‐learning classifiers based on logistic regression (LR) and convolutional neural networks, and identify aspects of the text‐processing workflow that influence classification performance. Main conclusions: Our best classifiers can distinguish relevant from non‐relevant articles with over 90% accuracy. Using readily available abstracts and titles or abstracts alone produces significantly better results than using titles alone. LR and neural network models performed similarly. Crucially, we show that deploying such models on real‐world … (more)
Is Part Of:: Global ecology & biogeography. Volume 30:Issue 1(2021)
Journal:: Global ecology & biogeography
Issue:: Volume 30:Issue 1(2021)
Issue Display:: Volume 30, Issue 1 (2021)
Year:: 2021
Volume:: 30
Issue:: 1
Issue Sort Value:: 2021-0030-0001-0000
Page Start:: 339
Page End:: 347
Publication Date:: 2020-11-19
Subjects:: automated classification -- biodiversity indicators -- Biodiversity Intactness Index -- ecological data -- Living Planet Index -- machine learning -- text mining
Ecology -- Periodicals
Biogeography -- Periodicals
Biodiversity -- Periodicals
Macroevolution -- Periodicals
577
Journal URLs:: http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1466-8238 ↗
http://onlinelibrary.wiley.com/ ↗
DOI:: 10.1111/geb.13219 ↗
Languages:: English
ISSNs:: 1466-822X
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4195.390700
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 16059.xml