A new epidemiological approach using machine learning based on medico-administrative database: automatic identification and prevalence estimation in heart failure. (14th October 2021)
- Record Type:
- Journal Article
- Title:
- A new epidemiological approach using machine learning based on medico-administrative database: automatic identification and prevalence estimation in heart failure. (14th October 2021)
- Main Title:
- A new epidemiological approach using machine learning based on medico-administrative database: automatic identification and prevalence estimation in heart failure
- Authors:
- Mustafic, H
Gouysse, M
Dore, 0
Perray, L
Maravic, M
Jourdain, P - Abstract:
- Abstract: Introduction: Heart failure (HF) is a global pandemic and in developed countries, the prevalence of known HF is generally estimated at 1–2% of the general population, two thirds of whom are 70 years old and over and the prevalence among elderly is at over 10%. With the emergence of artificial intelligence at the service of health, machine learning approaches are a wide variety of data models and strategies that focus on algorithmic modeling which could improve our disease understanding and give opportunities for intervention. Purpose: To automatically identify people with HF and estimate more precisely its prevalence, both in a large scale and with a regular update, which remains a real challenge. Methods: Two sources of data were used, LPD and LRx, including respectively near 2.5 and 40 million subjects. LPD is a medical database with 1, 200 general practitioners and 100 cardiologists, who participated in a permanent longitudinal observatory of ambulatory medicine prescriptions from 2018/01/01 to 2019/12/31. This database included 9, 024 well identified and treated HF subjects in 2019, whom data were used to train the algorithm. LRx is an anonymized medication dispensing in outpatient care database. It includes a panel of 10, 000 French retails pharmacies which represented nearly 45% of all the retails pharmacies in the continental France, making this huge sample quite representative. Different machine learning algorithms (gradient boosting, logistic regression,Abstract: Introduction: Heart failure (HF) is a global pandemic and in developed countries, the prevalence of known HF is generally estimated at 1–2% of the general population, two thirds of whom are 70 years old and over and the prevalence among elderly is at over 10%. With the emergence of artificial intelligence at the service of health, machine learning approaches are a wide variety of data models and strategies that focus on algorithmic modeling which could improve our disease understanding and give opportunities for intervention. Purpose: To automatically identify people with HF and estimate more precisely its prevalence, both in a large scale and with a regular update, which remains a real challenge. Methods: Two sources of data were used, LPD and LRx, including respectively near 2.5 and 40 million subjects. LPD is a medical database with 1, 200 general practitioners and 100 cardiologists, who participated in a permanent longitudinal observatory of ambulatory medicine prescriptions from 2018/01/01 to 2019/12/31. This database included 9, 024 well identified and treated HF subjects in 2019, whom data were used to train the algorithm. LRx is an anonymized medication dispensing in outpatient care database. It includes a panel of 10, 000 French retails pharmacies which represented nearly 45% of all the retails pharmacies in the continental France, making this huge sample quite representative. Different machine learning algorithms (gradient boosting, logistic regression, random forest) were trained from the HF subjects identified in LPD database and metrics performance were described. The model with the best metrics performance was chosen and deployed on the LRx database in order to identify HF subjects, describe their demographic characteristics, and calculate the prevalence for the period from 2019/11/01 to 2020/10/31 (from extrapolated number of HF subjects in continental France during this period). Results: Gradient boosting approach had the best metrics performance (sensitivity=85%, specificity=74%, positive predictive value=26%, negative predictive value=98%, positive like-hood radio=3.3, negative like-hood ratio=0.2, odds ratio=16.4, F1 score=0.40). This model identified 658, 329 HF subjects with 68.5% of women and the mean age was of 70.7 years (+12.0) in LRx. Patients aged ≥70 years represented 88.2% of the HF subjects group. The extrapolated number of HF subjects in continental France, in 2020, was of 1, 260, 207, which corresponded to a prevalence of 1.94%, in 2020, (the French continental population was estimated of 64.9 million the same year). Conclusion: Machine learning approaches can produce a consistent and accurate HF prevalence estimation in a large country sample with regular update of HF subjects.Thanks to this new approach, we could follow a regular and updated cohort of HF subjects and improve the national and local management strategies using these epidemiological estimations well adapted to the current situation. FUNDunding Acknowledgement: Type of funding sources: None. … (more)
- Is Part Of:
- European heart journal. Volume 42(2021)Supplement 1
- Journal:
- European heart journal
- Issue:
- Volume 42(2021)Supplement 1
- Issue Display:
- Volume 42, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 42
- Issue:
- 1
- Issue Sort Value:
- 2021-0042-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-10-14
- Subjects:
- Big Data Analysis
Cardiology -- Periodicals
Heart -- Diseases -- Periodicals
616.12005 - Journal URLs:
- http://eurheartj.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/eurheartj/ehab724.3164 ↗
- Languages:
- English
- ISSNs:
- 0195-668X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3829.717500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25630.xml