Multi-label clinical document classification: Impact of label-density. (30th December 2019)
- Record Type:
- Journal Article
- Title:
- Multi-label clinical document classification: Impact of label-density. (30th December 2019)
- Main Title:
- Multi-label clinical document classification: Impact of label-density
- Authors:
- Blanco, Alberto
Casillas, Arantza
Pérez, Alicia
Diaz de Ilarraza, Arantza - Abstract:
- Highlights: Expert clinicians assign, manually, codes from the ICD-10 to health records. Neural Networks (NNs) are well suited for multi-label classi cation tasks. Challenges: infer models from data with low label-density and capture dependencies. Experiments: three NNs with label-consistency rectification on two corpora. Contribution: (1) show dependence of performance on label density (2) release software. Abstract: Objective: The goal of this work is the classification of Electronic Health Records using Natural Language Techniques. Electronic Health Records (EHRs) convey valuable clinical information, as diagnoses and patient conditions. We explore several Deep Learning classification models for assigning multiple ICD codes to clinical documents. Within the framework of data mining, the aim of multi-label classification is to associate each instance with a set of labels. Methods: The multi-label classification is typically carried out based on multiple independent classifiers, in the so-called binary relevance learning approach. Nevertheless, diseases tend to be co-related, independent classifiers are unable to model relationships and do not guarantee the consistency of the predicted label-set. To tackle this, we investigate three Neural Network architectures. We study models that are capable of capturing and modeling label dependencies on the output layer. Moreover, learning from data with low label-density is an inherent challenge in multi-label classification. ThoroughHighlights: Expert clinicians assign, manually, codes from the ICD-10 to health records. Neural Networks (NNs) are well suited for multi-label classi cation tasks. Challenges: infer models from data with low label-density and capture dependencies. Experiments: three NNs with label-consistency rectification on two corpora. Contribution: (1) show dependence of performance on label density (2) release software. Abstract: Objective: The goal of this work is the classification of Electronic Health Records using Natural Language Techniques. Electronic Health Records (EHRs) convey valuable clinical information, as diagnoses and patient conditions. We explore several Deep Learning classification models for assigning multiple ICD codes to clinical documents. Within the framework of data mining, the aim of multi-label classification is to associate each instance with a set of labels. Methods: The multi-label classification is typically carried out based on multiple independent classifiers, in the so-called binary relevance learning approach. Nevertheless, diseases tend to be co-related, independent classifiers are unable to model relationships and do not guarantee the consistency of the predicted label-set. To tackle this, we investigate three Neural Network architectures. We study models that are capable of capturing and modeling label dependencies on the output layer. Moreover, learning from data with low label-density is an inherent challenge in multi-label classification. Thorough experiments were conducted to assess each architecture under different scenarios, varying the language, amount of data and label-density. Results: The results showed that the Bi-GRU model outperform the DNN and both overcome the baseline (BLR). We observed better results with MIMIC than with Osakidetza corpus. Experimental results showed that as the label-density decreases the prediction task becomes harder. It seems that label-density is very much related to the learning ability of the neural networks and another important factor that affects the inference is the amount of training data. Conclusions: The contributions of this work are: (a) a comparison among three classification approaches based on Neural Networks on data sets in English and Spanish to cope with the multi-label classification problem and (b) the study of the impact of label-density in prediction capabilities in the multi-label context. … (more)
- Is Part Of:
- Expert systems with applications. Volume 138(2019)
- Journal:
- Expert systems with applications
- Issue:
- Volume 138(2019)
- Issue Display:
- Volume 138, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 138
- Issue:
- 2019
- Issue Sort Value:
- 2019-0138-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-12-30
- Subjects:
- Multi-label classification -- Document classification -- Electronic health records -- ICD-10 classification
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2019.112835 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11805.xml