OTH-07 Identification of IBD cohorts from linked endoscopy and histology reports using natural language processing. (June 2019)
- Record Type:
- Journal Article
- Title:
- OTH-07 Identification of IBD cohorts from linked endoscopy and histology reports using natural language processing. (June 2019)
- Main Title:
- OTH-07 Identification of IBD cohorts from linked endoscopy and histology reports using natural language processing
- Authors:
- Brown, Jonathan
Zeki, Sebastian - Abstract:
- Abstract : Introduction: Patients with inflammatory bowel disease (IBD) are likely to undergo multiple lifetime endoscopic procedures which generate histopathological reports. Managing these patients requires clinicians to derive a phenotypic overview from numerous episodes and diverse sources which can be time consuming, incomplete and subjective. We set out to evaluate the potential for a computer to extract phenotypic parameters from a series of linked histopathology and endoscopy reports to characterise an IBD cohort. Methods: 118, 108 lower GI endoscopic procedure reports (200–017) and 62, 051 lower GI histology reports (200–017) from GRH were imported into an SQL database. Unique patient identification numbers from the merged dataset were replaced with 128 bit hexadecimal GUIDs and all patient identifiable information subsequently stripped from the data tables (Service Evaluation Project 8622). Text processing was undertaken in Python pandas dataframes: Import both datasets and separate all words by single space, convert to lower case, remove apostrophes Correct spelling of key words using Levenshtein distance Find regular expressions that match disease phenotypes Exclude non–IBD colitis diagnoses Exclude negated IBD diagnoses Export tagged machine interpreted reports back to SQL database Select 100 random reports for each IBD confirmed or negated diagnosis to validate against original text Return to steps – to modify regular expression reference lists to improveAbstract : Introduction: Patients with inflammatory bowel disease (IBD) are likely to undergo multiple lifetime endoscopic procedures which generate histopathological reports. Managing these patients requires clinicians to derive a phenotypic overview from numerous episodes and diverse sources which can be time consuming, incomplete and subjective. We set out to evaluate the potential for a computer to extract phenotypic parameters from a series of linked histopathology and endoscopy reports to characterise an IBD cohort. Methods: 118, 108 lower GI endoscopic procedure reports (200–017) and 62, 051 lower GI histology reports (200–017) from GRH were imported into an SQL database. Unique patient identification numbers from the merged dataset were replaced with 128 bit hexadecimal GUIDs and all patient identifiable information subsequently stripped from the data tables (Service Evaluation Project 8622). Text processing was undertaken in Python pandas dataframes: Import both datasets and separate all words by single space, convert to lower case, remove apostrophes Correct spelling of key words using Levenshtein distance Find regular expressions that match disease phenotypes Exclude non–IBD colitis diagnoses Exclude negated IBD diagnoses Export tagged machine interpreted reports back to SQL database Select 100 random reports for each IBD confirmed or negated diagnosis to validate against original text Return to steps – to modify regular expression reference lists to improve sensitivity and specificity. Results: The following results were obtained after multiple validation cycles initially based on an empiric regular expression dataset. Some caution is required in interpretation of the specificity of the Crohn's and ulcerative colitis histopathology reports. Many samples are described as showing features of both diseases and the final conclusion is given as a likelihood or unclassified. The specificities reported here are for all IBD and do not reflect a capacity to distinguish between the different types. Conclusions: The evolution of the disease characteristic regular expressions through repeated validation cycles has provided a powerful tool for the automated generation of IBD databases from text in semi-structured endoscopy and histology reports. The potential for the scheduling of surveillance and linkage to other systems, such as primary care prescribing, are obvious. Further development will include a more detailed phenotypic interpretation and computation of the histopathological certainty in distinguishing the types of IBD. … (more)
- Is Part Of:
- Gut. Volume 68(2019)Supplement 2
- Journal:
- Gut
- Issue:
- Volume 68(2019)Supplement 2
- Issue Display:
- Volume 68, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 68
- Issue:
- 2
- Issue Sort Value:
- 2019-0068-0002-0000
- Page Start:
- A224
- Page End:
- A224
- Publication Date:
- 2019-06
- Subjects:
- Gastroenterology -- Periodicals
616.33 - Journal URLs:
- http://gut.bmjjournals.com ↗
http://www.bmj.com/archive ↗ - DOI:
- 10.1136/gutjnl-2019-BSGAbstracts.426 ↗
- Languages:
- English
- ISSNs:
- 0017-5749
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18573.xml