FAIRVASC: A semantic web approach to rare disease registry integration. (June 2022)
- Record Type:
- Journal Article
- Title:
- FAIRVASC: A semantic web approach to rare disease registry integration. (June 2022)
- Main Title:
- FAIRVASC: A semantic web approach to rare disease registry integration
- Authors:
- McGlinn, Kris
Rutherford, Matthew A.
Gisslander, Karl
Hederman, Lucy
Little, Mark A.
O'Sullivan, Declan - Abstract:
- Abstract: Rare disease data is often fragmented within multiple heterogeneous siloed regional disease registries, each containing a small number of cases. These data are particularly sensitive, as low subject counts make the identification of patients more likely, meaning registries are not inclined to share subject level data outside their registries. At the same time access to multiple rare disease datasets is important as it will lead to new research opportunities and analysis over larger cohorts. To enable this, two major challenges must therefore be overcome. The first is to integrate data at a semantic level, so that it is possible to query over registries and return results which are comparable. The second is to enable queries which do not take subject level data from the registries. To meet the first challenge, this paper presents the FAIRVASC ontology to manage data related to the rare disease anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV), which is based on the harmonisation of terms in seven European data registries. It has been built upon a set of key clinical questions developed by a team of experts in vasculitis selected from the registry sites and makes use of several standard classifications, such as Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) and Orphacode. It also presents the method for adding semantic meaning to AAV data across the registries using the declarative Relational to Resource Description FrameworkAbstract: Rare disease data is often fragmented within multiple heterogeneous siloed regional disease registries, each containing a small number of cases. These data are particularly sensitive, as low subject counts make the identification of patients more likely, meaning registries are not inclined to share subject level data outside their registries. At the same time access to multiple rare disease datasets is important as it will lead to new research opportunities and analysis over larger cohorts. To enable this, two major challenges must therefore be overcome. The first is to integrate data at a semantic level, so that it is possible to query over registries and return results which are comparable. The second is to enable queries which do not take subject level data from the registries. To meet the first challenge, this paper presents the FAIRVASC ontology to manage data related to the rare disease anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV), which is based on the harmonisation of terms in seven European data registries. It has been built upon a set of key clinical questions developed by a team of experts in vasculitis selected from the registry sites and makes use of several standard classifications, such as Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) and Orphacode. It also presents the method for adding semantic meaning to AAV data across the registries using the declarative Relational to Resource Description Framework Mapping Language (R2RML). To meet the second challenge a federated querying approach is presented for accessing aggregated and pseudonymized data, and which supports analysis of AAV data in a manner which protects patient privacy. For additional security the federated querying approach is augmented with a method for auditing queries (and the uplift process) using the provenance ontology (PROV-O) to track when queries and changes occur and by whom. The main contribution of this work is the successful application of semantic web technologies and federated queries to provide a novel infrastructure that can readily incorporate additional registries, thus providing access to harmonised data relating to unprecedented numbers of patients with rare disease, while also meeting data privacy and security concerns. Highlights: The FAIRVASC ontology to manage data related to the rare disease anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV). A description of the R2RML mappings for converting ANCA data into RDF, along with sample queries for querying the data. A description of the infrastructure to enable federated querying, along with a snapshot of a query interface. Challenges and recommendations for clinical registries wishing to repeat the process. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 145(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 145(2022)
- Issue Display:
- Volume 145, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 145
- Issue:
- 2022
- Issue Sort Value:
- 2022-0145-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Knowledge engineering -- Linked data -- Ontologies -- Federated queries -- Rare diseases
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105313 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21547.xml