Automatic data extraction from 24 hour blood pressure measurement reports of a large multicenter clinical trial. (February 2022)
- Record Type:
- Journal Article
- Title:
- Automatic data extraction from 24 hour blood pressure measurement reports of a large multicenter clinical trial. (February 2022)
- Main Title:
- Automatic data extraction from 24 hour blood pressure measurement reports of a large multicenter clinical trial
- Authors:
- Nolde, Janis M
Mian, Ajmal
Schlaich, Luca
Chan, Justine
Lugo-Gavidia, Leslie Marisol
Barrie, Nicola
Gopal, Vishal
Hillis, Graham S
Chow, Clara K
Schlaich, Markus P - Abstract:
- Highlights: Ambulatory blood pressure monitoring (ABPM) is commonly represented in clinical routine and research as summarizing descriptive statistics of individual measurements. Access to raw ambulatory blood pressure monitoring data has however advantages for statistical analysis, flexibility with definitions diurnal and nocturnal periods and potential derivation of novel, clinically valuable indices. We developed a data extraction and processing algorithm, database structure and validation tools to gain access to raw ABPM data that was collected as part of a clinical study. This process led to an efficient, flexible, accessible and accurate extraction of the data that is henceforth available for scientific analysis. Abstract: Background and objectives: Ambulatory blood pressure monitoring (ABPM) is usually reported in descriptive values such as circadian averages and standard deviations. Making use of the original, individual blood pressure measurements may be advantageous, particularly for research purposes, as this increases the flexibility of the analytical process, enables alternative statistical analyses and provide novel insights. Here we describe the development of a new multistep, hierarchical data extraction algorithm to collect raw data from .pdf reports and text files as part of a large multi-center clinical study. Methods: Original reports were saved in a nested file system, from which they were automatically extracted, read and saved into databases withHighlights: Ambulatory blood pressure monitoring (ABPM) is commonly represented in clinical routine and research as summarizing descriptive statistics of individual measurements. Access to raw ambulatory blood pressure monitoring data has however advantages for statistical analysis, flexibility with definitions diurnal and nocturnal periods and potential derivation of novel, clinically valuable indices. We developed a data extraction and processing algorithm, database structure and validation tools to gain access to raw ABPM data that was collected as part of a clinical study. This process led to an efficient, flexible, accessible and accurate extraction of the data that is henceforth available for scientific analysis. Abstract: Background and objectives: Ambulatory blood pressure monitoring (ABPM) is usually reported in descriptive values such as circadian averages and standard deviations. Making use of the original, individual blood pressure measurements may be advantageous, particularly for research purposes, as this increases the flexibility of the analytical process, enables alternative statistical analyses and provide novel insights. Here we describe the development of a new multistep, hierarchical data extraction algorithm to collect raw data from .pdf reports and text files as part of a large multi-center clinical study. Methods: Original reports were saved in a nested file system, from which they were automatically extracted, read and saved into databases with custom made programs written in Python 3. Data were further processed, cleaned and relevant descriptive statistics such as averages and standard deviations calculated according to a variety of definitions of day- and night-time. Additionally, data control mechanisms for manual review of the data and programmatic auto-detection of extraction errors was implemented as part of the project. Results: The developed algorithm extracted 97% of the data automatically, the missing data consisted mostly of reports that were saved incorrectly or not formatted in the specified way. Manual checks comparing samples of the extracted data to original reports indicated a high level of accuracy of the extracted data, no errors introduced due to flaws in the extraction software were detected in the extracted dataset. Conclusions: The developed multistep, hierarchical data extraction algorithm facilitated collection from different file formats and paired with database cleaning and data processing steps led to an effective and accurate assembly of raw ABPM data for further and adjustable analyses. Manual work was minimized while data quality was ensured with standardized, reproducible procedures. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 214(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 214(2022)
- Issue Display:
- Volume 214, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 214
- Issue:
- 2022
- Issue Sort Value:
- 2022-0214-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02
- Subjects:
- Data extraction -- Database -- Automation -- Blood pressure -- Cardiovascular risk
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106588 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20631.xml