A machine learning approach to identify cases of cerebral palsy using the UK primary care database. (November 2018)
- Record Type:
- Journal Article
- Title:
- A machine learning approach to identify cases of cerebral palsy using the UK primary care database. (November 2018)
- Main Title:
- A machine learning approach to identify cases of cerebral palsy using the UK primary care database
- Authors:
- Fan, Heng
Li, Leah
Gilbert, Ruth
O'Callaghan, Finbar
Wijlaars, Linda - Abstract:
- Abstract: Background: Cerebral palsy is a complex condition that can manifest in different ways, and diagnosis is likely to be under-recorded in primary care databases. This study aimed to identify potential unrecorded cases based on other available information in patients' medical records. Methods: A machine learning approach was used to identify likely cases of cerebral palsy in live births between Jan 1, 1990, and April 30, 2016, in the Clinical Practice Research Datalink (CPRD), a UK primary care database. Firstly, we made a preliminary selection of predictor variables (medical and drug codes) by comparing their relative frequencies associated with known cases and with the remaining non-cases; secondly, we reduced the number of variables using the random forest method based on a resampled balanced population; thirdly, we used a logistic regression model with selected codes to predict the probability for cerebral palsy; and lastly, the medical records of identified likely cases were manually reviewed with expert clinical knowledge to validate the cases. Scientific approval for this study was given by the CPRD Independent Scientific Advisory Committee. Findings: Of 485 709 live births, 664 (0·14%) were initially identified as known cases of cerebral palsy using 43 validated diagnostic codes. 175 of 31 605 codes in the records were discovered more frequently in known cases of cerebral palsy than in non-cases. 35 of the most informative codes (eg, skeletal muscle relaxants,Abstract: Background: Cerebral palsy is a complex condition that can manifest in different ways, and diagnosis is likely to be under-recorded in primary care databases. This study aimed to identify potential unrecorded cases based on other available information in patients' medical records. Methods: A machine learning approach was used to identify likely cases of cerebral palsy in live births between Jan 1, 1990, and April 30, 2016, in the Clinical Practice Research Datalink (CPRD), a UK primary care database. Firstly, we made a preliminary selection of predictor variables (medical and drug codes) by comparing their relative frequencies associated with known cases and with the remaining non-cases; secondly, we reduced the number of variables using the random forest method based on a resampled balanced population; thirdly, we used a logistic regression model with selected codes to predict the probability for cerebral palsy; and lastly, the medical records of identified likely cases were manually reviewed with expert clinical knowledge to validate the cases. Scientific approval for this study was given by the CPRD Independent Scientific Advisory Committee. Findings: Of 485 709 live births, 664 (0·14%) were initially identified as known cases of cerebral palsy using 43 validated diagnostic codes. 175 of 31 605 codes in the records were discovered more frequently in known cases of cerebral palsy than in non-cases. 35 of the most informative codes (eg, skeletal muscle relaxants, prematurity, and being seen in paediatric clinic) were selected and used to build the logistic prediction model, which yielded 787 most likely cases (with predicted probability for cerebral palsy ≥0·975). On the basis of evidence of both motor disorder and brain injury, after manual review of medical records, 405 children were validated as cases additional to the known cases, resulting in a cerebral palsy prevalence of 0·22% in live births, which is comparable to existing evidence. Interpretation: Data-driven schemes, such as random forests, have the potential of identifying the most informative predictors in a cost-effective way to reliably identify potential unrecorded cases of cerebral palsy or other complex medical conditions in primary care databases. Funding: Economic and Social Research Council (grant ref ES/L007517/1). … (more)
- Is Part Of:
- Lancet. Volume 392(2018)Supplement 2
- Journal:
- Lancet
- Issue:
- Volume 392(2018)Supplement 2
- Issue Display:
- Volume 392, Issue 2 (2018)
- Year:
- 2018
- Volume:
- 392
- Issue:
- 2
- Issue Sort Value:
- 2018-0392-0002-0000
- Page Start:
- S33
- Page End:
- Publication Date:
- 2018-11
- Subjects:
- Medicine -- Periodicals
Medicine -- Periodicals
Medicine
Medicine
Electronic journals
Periodicals
610.5 - Journal URLs:
- http://www.thelancet.com/ ↗
http://www.sciencedirect.com/science/journal/01406736 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/S0140-6736(18)32077-4 ↗
- Languages:
- English
- ISSNs:
- 0140-6736
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5146.000000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8755.xml