A Natural Language Processing Approach to Understanding Context in the Extraction and GeoCoding of Historical Floods, Storms, and Adaptation Measures. Issue 1 (January 2022)
- Record Type:
- Journal Article
- Title:
- A Natural Language Processing Approach to Understanding Context in the Extraction and GeoCoding of Historical Floods, Storms, and Adaptation Measures. Issue 1 (January 2022)
- Main Title:
- A Natural Language Processing Approach to Understanding Context in the Extraction and GeoCoding of Historical Floods, Storms, and Adaptation Measures
- Authors:
- Lai, Kelvin
Porter, Jeremy R.
Amodeo, Mike
Miller, David
Marston, Michael
Armal, Saman - Abstract:
- Highlights: ● Leverage NLP to create a domain specific NER model from a diverse set of online media ● Rely on domain-specific statistical models, linguistics, and rule-based matching ● Bolsters the acceptable corpus formats and maintains similar accuracy and reliability ● Result is a highly reliable and geographically relevant dataset ● Find precise locations of nearly 650k flood events in the US in the past two decades Abstract: Despite the known financial, economical, and humanitarian impacts of hurricanes and the floods that follow, datasets consisting of flood and flood risk reduction projects are either small in scope, lack in details, or held privately by commercial holders. However, with the amount of online data growing exponentially, we have seen a rise of information extraction techniques on unstructured text to drive insights. On one hand, social media in particular has seen a tremendous increase in popularity. On the other hand, despite this popularity, social media has proven to be unreliable and difficult to extract full information from. In contrast, online newspapers are often vetted by a journalist, and consist of more fine details. As a result, in this paper we leverage Natural Language Processing (NLP) to create a hybrid Named-Entity Recognition (NER) model that employs a domain-specific machine learning model, linguistic features, and rule-based matching to extract information from newspapers. To the knowledge of the authors, this model is the first ofHighlights: ● Leverage NLP to create a domain specific NER model from a diverse set of online media ● Rely on domain-specific statistical models, linguistics, and rule-based matching ● Bolsters the acceptable corpus formats and maintains similar accuracy and reliability ● Result is a highly reliable and geographically relevant dataset ● Find precise locations of nearly 650k flood events in the US in the past two decades Abstract: Despite the known financial, economical, and humanitarian impacts of hurricanes and the floods that follow, datasets consisting of flood and flood risk reduction projects are either small in scope, lack in details, or held privately by commercial holders. However, with the amount of online data growing exponentially, we have seen a rise of information extraction techniques on unstructured text to drive insights. On one hand, social media in particular has seen a tremendous increase in popularity. On the other hand, despite this popularity, social media has proven to be unreliable and difficult to extract full information from. In contrast, online newspapers are often vetted by a journalist, and consist of more fine details. As a result, in this paper we leverage Natural Language Processing (NLP) to create a hybrid Named-Entity Recognition (NER) model that employs a domain-specific machine learning model, linguistic features, and rule-based matching to extract information from newspapers. To the knowledge of the authors, this model is the first of its kind to extract detailed flooding information and risk reduction projects over the entire contiguous United States. The approach used in this paper expands upon previous similar works by widening the geographical location and applying techniques to extract information over large documents, with minimal accuracy loss from the previous methods. Specifically, our model is able to extract information such as street closures, project costs, and metrics. Our validation indicates an F1 score of 72.13% for the NER model entity extraction, a binary classification location filter with a score of 73%, and an overall performance only 8.4% lower than a human validator against a gold-standard. Through this process, we find the location of 27, 444 streets, 181, 076 flood risk reduction projects, and 435, 353 storm locations throughout the United States in the past two decades. … (more)
- Is Part Of:
- Information processing & management. Volume 59:Issue 1(2022)
- Journal:
- Information processing & management
- Issue:
- Volume 59:Issue 1(2022)
- Issue Display:
- Volume 59, Issue 1 (2022)
- Year:
- 2022
- Volume:
- 59
- Issue:
- 1
- Issue Sort Value:
- 2022-0059-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Information Extraction -- Natural language processing -- Floods -- Machine Learning -- Newspapers
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2021.102735 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19853.xml