Development of predictive models to identify advanced-stage cancer patients in a US healthcare claims database. (August 2019)
- Record Type:
- Journal Article
- Title:
- Development of predictive models to identify advanced-stage cancer patients in a US healthcare claims database. (August 2019)
- Main Title:
- Development of predictive models to identify advanced-stage cancer patients in a US healthcare claims database
- Authors:
- Esposito, Daina B.
Russo, Leo
Oksen, Dina
Yin, Ruihua
Desai, Vibha C.A.
Lyons, Jennifer G.
Verpillat, Patrice
Peñalvo, Jose L.
Lamy, Francois-Xavier
Lanes, Stephan - Abstract:
- Highlights: Supervised machine learning was used to identify metastatic/advanced stage cancer in claims data. For five sites, each model identified advanced stage status with a high PPV. Validation studies can be used to identify cancer stage in claims databases. Abstract: Background: Although healthcare databases are a valuable source for real-world oncology data, cancer stage is often lacking. We developed predictive models using claims data to identify metastatic/advanced-stage patients with ovarian cancer, urothelial carcinoma, gastric adenocarcinoma, Merkel cell carcinoma (MCC), and non-small cell lung cancer (NSCLC). Methods: Patients with ≥1 diagnosis of a cancer of interest were identified in the HealthCore Integrated Research Database (HIRD), a United States (US) healthcare database (2010–2016). Data were linked to three US state cancer registries and the HealthCore Integrated Research Environment Oncology database to identify cancer stage. Predictive models were constructed to estimate the probability of metastatic/advanced stage. Predictors available in the HIRD were identified and coefficients estimated by Least Absolute Shrinkage and Selection Operator (LASSO) regression with cross-validation to control overfitting. Classification error rates and receiver operating characteristic curves were used to select probability thresholds for classifying patients as cases of metastatic/advanced cancer. Results: We used 2723 ovarian cancer, 6522 urothelial carcinoma, 1441Highlights: Supervised machine learning was used to identify metastatic/advanced stage cancer in claims data. For five sites, each model identified advanced stage status with a high PPV. Validation studies can be used to identify cancer stage in claims databases. Abstract: Background: Although healthcare databases are a valuable source for real-world oncology data, cancer stage is often lacking. We developed predictive models using claims data to identify metastatic/advanced-stage patients with ovarian cancer, urothelial carcinoma, gastric adenocarcinoma, Merkel cell carcinoma (MCC), and non-small cell lung cancer (NSCLC). Methods: Patients with ≥1 diagnosis of a cancer of interest were identified in the HealthCore Integrated Research Database (HIRD), a United States (US) healthcare database (2010–2016). Data were linked to three US state cancer registries and the HealthCore Integrated Research Environment Oncology database to identify cancer stage. Predictive models were constructed to estimate the probability of metastatic/advanced stage. Predictors available in the HIRD were identified and coefficients estimated by Least Absolute Shrinkage and Selection Operator (LASSO) regression with cross-validation to control overfitting. Classification error rates and receiver operating characteristic curves were used to select probability thresholds for classifying patients as cases of metastatic/advanced cancer. Results: We used 2723 ovarian cancer, 6522 urothelial carcinoma, 1441 gastric adenocarcinoma, 109 MCC, and 12, 373 NSCLC cases of early and metastatic/advanced cancer to develop predictive models. All models had high discrimination (C > 0.85). At thresholds selected for each model, PPVs were all >0.75: ovarian cancer = 0.95 (95% confidence interval [95% CI]: 0.94–0.96), urothelial carcinoma = 0.78 (95% CI: 0.70–0.86), gastric adenocarcinoma = 0.86 (95% CI: 0.83–0.88), MCC = 0.77 (95% CI 0.68–0.89), and NSCLC = 0.91 (95% CI 0.90 – 0.92). Conclusion: Predictive modeling was used to identify five types of metastatic/advanced cancer in a healthcare claims database with greater accuracy than previous methods. … (more)
- Is Part Of:
- Cancer epidemiology. Volume 61(2019:Aug.)
- Journal:
- Cancer epidemiology
- Issue:
- Volume 61(2019:Aug.)
- Issue Display:
- Volume 61 (2019)
- Year:
- 2019
- Volume:
- 61
- Issue Sort Value:
- 2019-0061-0000-0000
- Page Start:
- 30
- Page End:
- 37
- Publication Date:
- 2019-08
- Subjects:
- Ovarian cancer -- Urothelial carcinoma -- Gastric adenocarcinoma -- Merkel cell carcinoma -- Non-small cell lung cancer -- Predictive modeling -- Machine learning -- LASSO regression
Cancer -- Epidemiology -- Periodicals
Cancer -- Prevention -- Periodicals
Cancer -- Diagnosis -- Periodicals
Carcinogenesis -- Periodicals
616.994005 - Journal URLs:
- http://www.sciencedirect.com/science/journal/18777821 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.canep.2019.05.006 ↗
- Languages:
- English
- ISSNs:
- 1877-7821
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3046.477910
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14200.xml