BatteryDataExtractor: battery-aware text-mining software embedded with BERT models. Issue 39 (23rd September 2022)
- Record Type:
- Journal Article
- Title:
- BatteryDataExtractor: battery-aware text-mining software embedded with BERT models. Issue 39 (23rd September 2022)
- Main Title:
- BatteryDataExtractor: battery-aware text-mining software embedded with BERT models
- Authors:
- Huang, Shu
Cole, Jacqueline M. - Abstract:
- Abstract : BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model. Abstract : Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, andAbstract : BatteryDataExtractor is the first property-specific text-mining tool for auto-generating databases of materials and their property, device, and associated characteristics. The software has been constructed by embedding the BatteryBERT model. Abstract : Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide. … (more)
- Is Part Of:
- Chemical science. Volume 13:Issue 39(2022)
- Journal:
- Chemical science
- Issue:
- Volume 13:Issue 39(2022)
- Issue Display:
- Volume 13, Issue 39 (2022)
- Year:
- 2022
- Volume:
- 13
- Issue:
- 39
- Issue Sort Value:
- 2022-0013-0039-0000
- Page Start:
- 11487
- Page End:
- 11495
- Publication Date:
- 2022-09-23
- Subjects:
- Chemistry -- Periodicals
540.5 - Journal URLs:
- http://pubs.rsc.org/en/Journals/JournalIssues/SC ↗
http://www.rsc.org/ ↗ - DOI:
- 10.1039/d2sc04322j ↗
- Languages:
- English
- ISSNs:
- 2041-6520
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3151.490000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 24104.xml