Automatic information extraction from childhood cancer pathology reports. Issue 2 (16th June 2022)
- Record Type:
- Journal Article
- Title:
- Automatic information extraction from childhood cancer pathology reports. Issue 2 (16th June 2022)
- Main Title:
- Automatic information extraction from childhood cancer pathology reports
- Authors:
- Yoon, Hong-Jun
Peluso, Alina
Durbin, Eric B
Wu, Xiao-Cheng
Stroup, Antoinette
Doherty, Jennifer
Schwartz, Stephen
Wiggins, Charles
Coyle, Linda
Penberthy, Lynne - Abstract:
- Abstract: Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials and Methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators atAbstract: Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials and Methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably. Lay Summary: ICCC is the coding standard designed to categorize childhood cancers. However, machine learning-based ICCC classification has not been extensively studied, mainly owing to the limited volume of the pediatric cancer corpus; pediatric cancer is much less prevalent than adult cancers. Under the oversight of the National Childhood Cancer Registry project, we developed a deep learning-based text comprehension model for classifying ICCC from childhood cancer pathology reports. We performed a comparison study between (1) classifying ICD-O-3 codes and then recoding into ICCC and (2) classifying ICCC codes directly. We observed that the second approach exhibited a substantially higher accuracy score.We are aware that the low-precision models are not appropriate for this exercise because they will degrade the credibility of the model-based decisions. We applied an uncertainty quantification algorithm to the ICCC classification model. We achieved nearly perfect accuracy scores, while the model passed over 14.8% of ambiguous cases. This result means our machine learning model can serve human annotators at state cancer registries by processing 85.2% of the childhood cancer pathology reports automatically. … (more)
- Is Part Of:
- JAMIA open. Volume 5:Issue 2(2022)
- Journal:
- JAMIA open
- Issue:
- Volume 5:Issue 2(2022)
- Issue Display:
- Volume 5, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 5
- Issue:
- 2
- Issue Sort Value:
- 2022-0005-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06-16
- Subjects:
- pediatric cancer -- cancer pathology reports -- information extraction -- machine learning
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
https://academic.oup.com/jamiaopen ↗ - DOI:
- 10.1093/jamiaopen/ooac049 ↗
- Languages:
- English
- ISSNs:
- 2574-2531
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22037.xml