PTU-105 Automated, algorithm based extraction of Barrett's surveillance metrics from natural language text is reliable. (June 2019)
- Record Type:
- Journal Article
- Title:
- PTU-105 Automated, algorithm based extraction of Barrett's surveillance metrics from natural language text is reliable. (June 2019)
- Main Title:
- PTU-105 Automated, algorithm based extraction of Barrett's surveillance metrics from natural language text is reliable
- Authors:
- Zeki, Sebastian
Hackett, Richard
Dunn, Jason
Bancil, Aaron
Preston, Sean
Chin-Aleong, Joanne
Brown, Jonathan
McDonald, Stuart - Abstract:
- Abstract : Introduction: Patients with Barrett's oesophagus (BE) undergo regular endoscopic surveillance with a view to earlier oesophageal adenocarcinoma detection. Quality monitoring of this programme relies on manual extraction of elements from pathology and endoscopic semi-structured free text reports. Manual extraction is laborious and a significant hindrance to robust, large scale and reproducible quality monitoring. EndoMineR, a package written in R, (a free, open source computational language) has been developed specifically to automate the extraction of data from endoscopic and associated pathology reports 1 . It contains functions to clean, format and extract elements from free text and perform quality metrics for a range of conditions including in BE. Aim: We assessed the accuracy of the BE extraction algorithms for both endoscopic and pathological elements for BE on pathology data only as it is the 'worst case scenario' input data, using the EndoMineR package. The functions being assessed were: 1. The extraction of a Prague score, 2. The extraction of the worst pathology grade, 3. The site of biopsied tissue, 4. The site and type of any therapy in the upper GI tract. Methods: Ethics was approved (IRAS number ). 60 patient episodes between 14 January 2016 and 30 March 2016 with full text pathology data only were acquired from 8 departments in central London as a training set. Validation was performed on a further 100 pathology reports. The therapy algorithm wasAbstract : Introduction: Patients with Barrett's oesophagus (BE) undergo regular endoscopic surveillance with a view to earlier oesophageal adenocarcinoma detection. Quality monitoring of this programme relies on manual extraction of elements from pathology and endoscopic semi-structured free text reports. Manual extraction is laborious and a significant hindrance to robust, large scale and reproducible quality monitoring. EndoMineR, a package written in R, (a free, open source computational language) has been developed specifically to automate the extraction of data from endoscopic and associated pathology reports 1 . It contains functions to clean, format and extract elements from free text and perform quality metrics for a range of conditions including in BE. Aim: We assessed the accuracy of the BE extraction algorithms for both endoscopic and pathological elements for BE on pathology data only as it is the 'worst case scenario' input data, using the EndoMineR package. The functions being assessed were: 1. The extraction of a Prague score, 2. The extraction of the worst pathology grade, 3. The site of biopsied tissue, 4. The site and type of any therapy in the upper GI tract. Methods: Ethics was approved (IRAS number ). 60 patient episodes between 14 January 2016 and 30 March 2016 with full text pathology data only were acquired from 8 departments in central London as a training set. Validation was performed on a further 100 pathology reports. The therapy algorithm was performed on a further 100 reports. Abstract PTU-105 Table 1 Sensitivity, specificity, positive and negative predictive values of each of the functions being assessed Results: Reports were written by 11 different pathologists. The readability index of all the text, using the Fleisch-Kincaid readability index was 11.7 (sd:1.22) indicating an average grammatical complexity. Sensitivity was excellent for all algorithms especially given the difficult input text (Table 1 ). A reduction in specificity in the detection of worst pathology occurred because of dual reporting of colonoscopy and gastroscopy tissue which also affected the sensitivity of the Pathology Site detection. A variability in how intestinalisation was reported also affected the specificity. Conclusion: Reproducible extraction can be done from semi–structured text. Further improvements using parts of speech tagging and term mapping will improve the results. Such data extraction will allow for upstream automation of quality monitoring, governance and novel metrics. References: Zeki S of Open Source, (2018). EndoMineR for the extraction of endoscopic and associated pathology data from medical reports. Journal Software;3(24):701. … (more)
- Is Part Of:
- Gut. Volume 68(2019)Supplement 2
- Journal:
- Gut
- Issue:
- Volume 68(2019)Supplement 2
- Issue Display:
- Volume 68, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 68
- Issue:
- 2
- Issue Sort Value:
- 2019-0068-0002-0000
- Page Start:
- A243
- Page End:
- A243
- Publication Date:
- 2019-06
- Subjects:
- Gastroenterology -- Periodicals
616.33 - Journal URLs:
- http://gut.bmjjournals.com ↗
http://www.bmj.com/archive ↗ - DOI:
- 10.1136/gutjnl-2019-BSGAbstracts.464 ↗
- Languages:
- English
- ISSNs:
- 0017-5749
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19009.xml