Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study. (January 2021)
- Record Type:
- Journal Article
- Title:
- Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study. (January 2021)
- Main Title:
- Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study
- Authors:
- Appelbaum, Limor
Cambronero, José P.
Stevens, Jennifer P.
Horng, Steven
Pollick, Karla
Silva, George
Haneuse, Sebastien
Piatkowski, Gail
Benhaga, Nordine
Duey, Stacey
Stevenson, Mary A.
Mamon, Harvey
Kaplan, Irving D.
Rinard, Martin C. - Abstract:
- Abstract: Aim: Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. Methods: Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient, outpatient, and emergency care, from 1979 through 2017, were used with case–control matching. PDAC cases were selected using International Classification of Diseases 9/10 codes and validated with tumour registries. A data-driven feature selection approach was used to develop neural networks and L2-regularised logistic regression (LR) models on training data (594 cases, 100, 787 controls) and compared with a published model based on hand-selected diagnoses ('baseline'). Model performance was validated on an external database (408 cases, 160, 185 controls). Three prediction lead times (180, 270 and 365 days) were considered. Results: The LR model had the best performance, with an area under the curve (AUC) of 0.71 (confidence interval [CI]: 0.67–0.76) for the training set, and AUC 0.68 (CI: 0.65–0.71) for the validation set, 365 days before diagnosis. Data-driven feature selection improved results over 'baseline' (AUC = 0.55; CI: 0.52–0.58). The LR model flags 2692 (CI 2592–2791) of 156, 485 as high risk, 365 days in advance, identifying 25 (CI: 16–36) cancer patients. Risk stratification showed that the high-risk groupAbstract: Aim: Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. Methods: Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient, outpatient, and emergency care, from 1979 through 2017, were used with case–control matching. PDAC cases were selected using International Classification of Diseases 9/10 codes and validated with tumour registries. A data-driven feature selection approach was used to develop neural networks and L2-regularised logistic regression (LR) models on training data (594 cases, 100, 787 controls) and compared with a published model based on hand-selected diagnoses ('baseline'). Model performance was validated on an external database (408 cases, 160, 185 controls). Three prediction lead times (180, 270 and 365 days) were considered. Results: The LR model had the best performance, with an area under the curve (AUC) of 0.71 (confidence interval [CI]: 0.67–0.76) for the training set, and AUC 0.68 (CI: 0.65–0.71) for the validation set, 365 days before diagnosis. Data-driven feature selection improved results over 'baseline' (AUC = 0.55; CI: 0.52–0.58). The LR model flags 2692 (CI 2592–2791) of 156, 485 as high risk, 365 days in advance, identifying 25 (CI: 16–36) cancer patients. Risk stratification showed that the high-risk group presented a cancer rate 3 to 5 times the prevalence in our data set. Conclusion: A simple EHR model, based on diagnoses, can identify high-risk individuals for PDAC up to one year in advance. This inexpensive, systematic approach may serve as the first sieve for selection of individuals for PDAC screening programs. Highlights: Medical records can be used to identify people at high risk for pancreatic cancer. The high-risk group identified 6–12 months before diagnosis, allowing early detection. A data-driven approach is superior to hand-selected features for model prediction. External validation of the model shows generalisability to new data. … (more)
- Is Part Of:
- European journal of cancer. Volume 143(2021)
- Journal:
- European journal of cancer
- Issue:
- Volume 143(2021)
- Issue Display:
- Volume 143, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 143
- Issue:
- 2021
- Issue Sort Value:
- 2021-0143-2021-0000
- Page Start:
- 19
- Page End:
- 30
- Publication Date:
- 2021-01
- Subjects:
- Pancreatic carcinoma -- Adenocarcinoma -- Electronic health records -- Logistic regression models -- AUC
Cancer -- Periodicals
Neoplasms -- Periodicals
Cancer -- Périodiques
Cancer
Tumors
Electronic journals
Periodicals
Electronic journals
616.994 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09598049 ↗
http://rzblx1.uni-regensburg.de/ezeit/warpto.phtml?colors=7&jour_id=2879 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/09598049 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/09598049 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ejca.2020.10.019 ↗
- Languages:
- English
- ISSNs:
- 0959-8049
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3829.725100
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15312.xml