399 Using Machine Learning to Inform Extraction of Clinical Data from Sleep Study Reports. (3rd May 2021)
- Record Type:
- Journal Article
- Title:
- 399 Using Machine Learning to Inform Extraction of Clinical Data from Sleep Study Reports. (3rd May 2021)
- Main Title:
- 399 Using Machine Learning to Inform Extraction of Clinical Data from Sleep Study Reports
- Authors:
- Mazzotti, Diego
Staley, Bethany
Keenan, Brendan
Pack, Allan
Schwab, Richard
Boland, Mary Regina - Abstract:
- Abstract: Introduction: In-laboratory and home sleep studies are important tools for diagnosing sleep disorders. However, a limited amount of measurements is used to inform disease severity and only specific measures, if any, are stored as structured fields into electronic health records (EHR). We propose a sleep study data extraction approach based on supervised machine learning to facilitate the development of specialized format-specific parsers for large-scale automated sleep data extraction. Methods: Using retrospective data from the Penn Medicine Sleep Center, we identified 64, 100 sleep study reports stored in Microsoft Word documents of varying formats, recorded from 2001–2018. A random sample of 200 reports was selected for manual annotation of formats (e.g., layout) and type (e.g. baseline, split-night, home sleep apnea tests). Using text mining tools, we extracted 71 document property features (e.g., section dimensions, paragraph and table elements, regular expression matches). We identified 14 different formats and 7 study types. We used these manual annotations as multiclass outcomes in a random forest classifier to evaluate prediction of sleep study format and type using document property features. Out-of-bag (OOB) error rates and multiclass area under the receiver operating curve (mAUC) were estimated to evaluate training and testing performance of each model. Results: We successfully predicted sleep study format and type using random forest classifiers.Abstract: Introduction: In-laboratory and home sleep studies are important tools for diagnosing sleep disorders. However, a limited amount of measurements is used to inform disease severity and only specific measures, if any, are stored as structured fields into electronic health records (EHR). We propose a sleep study data extraction approach based on supervised machine learning to facilitate the development of specialized format-specific parsers for large-scale automated sleep data extraction. Methods: Using retrospective data from the Penn Medicine Sleep Center, we identified 64, 100 sleep study reports stored in Microsoft Word documents of varying formats, recorded from 2001–2018. A random sample of 200 reports was selected for manual annotation of formats (e.g., layout) and type (e.g. baseline, split-night, home sleep apnea tests). Using text mining tools, we extracted 71 document property features (e.g., section dimensions, paragraph and table elements, regular expression matches). We identified 14 different formats and 7 study types. We used these manual annotations as multiclass outcomes in a random forest classifier to evaluate prediction of sleep study format and type using document property features. Out-of-bag (OOB) error rates and multiclass area under the receiver operating curve (mAUC) were estimated to evaluate training and testing performance of each model. Results: We successfully predicted sleep study format and type using random forest classifiers. Training OOB error rate was 5.6% for study format and 8.1% for study type. When evaluating these models in independent testing data, the mAUC for classification of study format was 0.85 and for study type was 1.00. When applied to the large universe of diagnostic sleep study reports, we successfully extracted hundreds of discrete fields in 38, 252 reports representing 33, 696 unique patients. Conclusion: We accurately classified a sample of sleep study reports according to their format and type, using a random forest multiclass classification method. This informed the development and successful deployment of custom data extraction tools for sleep study reports. The ability to leverage these data can improve understanding of sleep disorders in the clinical setting and facilitate implementation of large-scale research studies within the EHR. Support (if any): American Heart Association (20CDA35310360). … (more)
- Is Part Of:
- Sleep. Volume 44(2021)Supplement 2
- Journal:
- Sleep
- Issue:
- Volume 44(2021)Supplement 2
- Issue Display:
- Volume 44, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 44
- Issue:
- 2
- Issue Sort Value:
- 2021-0044-0002-0000
- Page Start:
- A158
- Page End:
- A159
- Publication Date:
- 2021-05-03
- Subjects:
- Sleep -- Physiological aspects -- Periodicals
Sleep disorders -- Periodicals
Sommeil -- Aspect physiologique -- Périodiques
Sommeil, Troubles du -- Périodiques
Sleep disorders
Sleep -- Physiological aspects
Sleep -- physiological aspects
Sleep Wake Disorders
Psychophysiology
Electronic journals
Periodicals
616.8498 - Journal URLs:
- http://bibpurl.oclc.org/web/21399 ↗
http://www.journalsleep.org/ ↗
https://academic.oup.com/sleep ↗
http://www.oxfordjournals.org/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=369&action=archive ↗ - DOI:
- 10.1093/sleep/zsab072.398 ↗
- Languages:
- English
- ISSNs:
- 0161-8105
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17286.xml