Automatic classification of free-text medical causes from death certificates for reactive mortality surveillance in France. (November 2019)
- Record Type:
- Journal Article
- Title:
- Automatic classification of free-text medical causes from death certificates for reactive mortality surveillance in France. (November 2019)
- Main Title:
- Automatic classification of free-text medical causes from death certificates for reactive mortality surveillance in France
- Authors:
- Baghdadi, Yasmine
Bourrée, Alix
Robert, Aude
Rey, Grégoire
Gallay, Anne
Zweigenbaum, Pierre
Grouin, Cyril
Fouillet, Anne - Abstract:
- Highlights: Rule-based method and SVM2 model displayed high classification performance. Misclassification errors were not specific of a mortality syndromic group. These methods are suitable with the daily analysis of free-text causes of death. Abstract: Background: Mortality surveillance is of fundamental importance to public health surveillance. The real-time recording of death certificates, thanks to Electronic Death Registration System (EDRS), provides valuable data for reactive mortality surveillance based on medical causes of death in free-text format. Reactive mortality surveillance is based on the monitoring of mortality syndromic groups (MSGs). An MSG is a cluster of medical causes of death (pathologies, syndromes or symptoms) that meets the objectives of early detection and impact assessment of public health events. The aim of this study is to implement and measure the performance of a rule-based method and two supervised models for automatic free-text cause of death classification from death certificates in order to implement them for routine surveillance. Method: A rule-based method was implemented using four processing steps: standardization rules, splitting causes of death using delimiters, spelling corrections and dictionary projection. A supervised machine learning method using a linear Support Vector Machine (SVM) classifier was also implemented. Two models were produced using different features ( SVM1 based solely on surface features and SVM2 combiningHighlights: Rule-based method and SVM2 model displayed high classification performance. Misclassification errors were not specific of a mortality syndromic group. These methods are suitable with the daily analysis of free-text causes of death. Abstract: Background: Mortality surveillance is of fundamental importance to public health surveillance. The real-time recording of death certificates, thanks to Electronic Death Registration System (EDRS), provides valuable data for reactive mortality surveillance based on medical causes of death in free-text format. Reactive mortality surveillance is based on the monitoring of mortality syndromic groups (MSGs). An MSG is a cluster of medical causes of death (pathologies, syndromes or symptoms) that meets the objectives of early detection and impact assessment of public health events. The aim of this study is to implement and measure the performance of a rule-based method and two supervised models for automatic free-text cause of death classification from death certificates in order to implement them for routine surveillance. Method: A rule-based method was implemented using four processing steps: standardization rules, splitting causes of death using delimiters, spelling corrections and dictionary projection. A supervised machine learning method using a linear Support Vector Machine (SVM) classifier was also implemented. Two models were produced using different features ( SVM1 based solely on surface features and SVM2 combining surface features and MSGs classified by the rule-based method as feature vectors). The evaluation was conducted using an annotated subset of electronic death certificates received between 2012 and 2016. Classification performance was evaluated on seven MSGs (Influenza, Low respiratory diseases, Asphyxia/abnormal respiration, Acute respiratory disease, Sepsis, Chronic digestive diseases, and Chronic endocrine diseases). Results: The rule-based method and the SVM2 model displayed a high performance with F-measures over 0.94 for all MSGs. Precision and recall were slightly higher for the rule-based method and the SVM2 model. An error-analysis shows that errors were not specific to an MSG. Conclusion: The high performance of the rule-based method and SVM2 model will allow us to set-up a reactive mortality surveillance system based on free-text death certificates. This surveillance will be an added-value for public health decision making. … (more)
- Is Part Of:
- International journal of medical informatics. Volume 131(2019)
- Journal:
- International journal of medical informatics
- Issue:
- Volume 131(2019)
- Issue Display:
- Volume 131, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 131
- Issue:
- 2019
- Issue Sort Value:
- 2019-0131-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-11
- Subjects:
- CépiDc epidemiology center on medical causes of death -- EDRS Electronic Death Registration System -- ICD10 International Classification of Diseases 10th revision -- MSG Mortality Syndromic Group -- NLP Natural Language Processing -- OOV out-of-vocabulary -- SVM Support Vector Machine -- WHO World Health Organization
Automatic classification -- Rule-based method -- SVM -- Evaluation performance -- Medical causes of death -- Syndromic surveillance
Medical informatics -- Periodicals
Information science -- Periodicals
Computers -- Periodicals
Medical technology -- Periodicals
Medical Informatics -- Periodicals
Technology, Medical -- Periodicals
Computers
Information science
Medical informatics
Medical technology
Electronic journals
Periodicals
Electronic journals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13865056 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/13865056 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/13865056 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijmedinf.2019.06.022 ↗
- Languages:
- English
- ISSNs:
- 1386-5056
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.345250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11840.xml