Injury severity on traffic crashes: A text mining with an interpretable machine-learning approach. (December 2020)
- Record Type:
- Journal Article
- Title:
- Injury severity on traffic crashes: A text mining with an interpretable machine-learning approach. (December 2020)
- Main Title:
- Injury severity on traffic crashes: A text mining with an interpretable machine-learning approach
- Authors:
- Arteaga, Cristian
Paz, Alexander
Park, JeeWoong - Abstract:
- Highlights: A new approach to analyze crash narratives to identify contributing factors to injury severity. Interpretability of the results is enabled by the proposed approach. Results were compared and benchmarked against commonly use advanced regression analyses. Results included conclusions that cannot be generated using other existing techniques. Abstract: The analysis of traffic crash severities provides significant information for the development of safety countermeasures. Most available traffic crash datasets contain rich information including linguistic narratives with details about crash events and contexts, which can reveal new insights regarding severity and associated causality factors. Previous research has paid insufficient attention to this source of information. This study proposes an approach to analyze traffic crash narratives to identify factors associated with high injury-severity levels. The proposed approach explicitly seeks global interpretability of the results by expanding the capabilities of the Local Interpretable Model-Agnostic Explanations (LIME) method. Our proposed new approach, Global Cross-Validation LIME (GCV-LIME), aggregates individual LIME explanations using cross-validation. Thus, this study combines machine learning-based text mining with GCV-LIME to identify likely causality factors for injury severities while providing interpretability as required by traffic safety analysts. Data for heavy vehicle crashes collected from 2007 to 2017Highlights: A new approach to analyze crash narratives to identify contributing factors to injury severity. Interpretability of the results is enabled by the proposed approach. Results were compared and benchmarked against commonly use advanced regression analyses. Results included conclusions that cannot be generated using other existing techniques. Abstract: The analysis of traffic crash severities provides significant information for the development of safety countermeasures. Most available traffic crash datasets contain rich information including linguistic narratives with details about crash events and contexts, which can reveal new insights regarding severity and associated causality factors. Previous research has paid insufficient attention to this source of information. This study proposes an approach to analyze traffic crash narratives to identify factors associated with high injury-severity levels. The proposed approach explicitly seeks global interpretability of the results by expanding the capabilities of the Local Interpretable Model-Agnostic Explanations (LIME) method. Our proposed new approach, Global Cross-Validation LIME (GCV-LIME), aggregates individual LIME explanations using cross-validation. Thus, this study combines machine learning-based text mining with GCV-LIME to identify likely causality factors for injury severities while providing interpretability as required by traffic safety analysts. Data for heavy vehicle crashes collected from 2007 to 2017 in Queensland, Australia, were used to evaluate the proposed approach. Six different machine-learning models were tested, and global explanations were generated using GCV-LIME. The results indicated a strong association among a set of terms, such as "collided_headon, " "side_collided, " "motorcycle, " "cab, " and "pedestrian" with fatal crashes. Results from GCV-LIME were compared with those obtained using the corresponding available tabular data and classic regression analysis. The comparison suggest that the proposed approach has great potential to provide additional insights as well as enables to confirm results obtained with classic analysis on tabular data. Results from GCV-LIME combined with knowledge and experience from safety analysts can help establish effective safety countermeasures based on factors likely causing crashes and/or increasing their severity. … (more)
- Is Part Of:
- Safety science. Volume 132(2020)
- Journal:
- Safety science
- Issue:
- Volume 132(2020)
- Issue Display:
- Volume 132, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 132
- Issue:
- 2020
- Issue Sort Value:
- 2020-0132-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12
- Subjects:
- Crash severity -- Text mining -- Machine learning -- Interpretable machine learning
Industrial accidents -- Periodicals
Accident Prevention -- Periodicals
Safety -- Periodicals
Travail -- Accidents -- Périodiques
363.11 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09257535 ↗
http://www.elsevier.com/journals ↗
http://www.journals.elsevier.com/safety-science/ ↗ - DOI:
- 10.1016/j.ssci.2020.104988 ↗
- Languages:
- English
- ISSNs:
- 0925-7535
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8069.124900
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 14735.xml