Multi-faceted semantic clustering with text-derived phenotypes. (November 2021)
- Record Type:
- Journal Article
- Title:
- Multi-faceted semantic clustering with text-derived phenotypes. (November 2021)
- Main Title:
- Multi-faceted semantic clustering with text-derived phenotypes
- Authors:
- Slater, Luke T.
Williams, John A.
Karwath, Andreas
Fanning, Hilary
Ball, Simon
Schofield, Paul N.
Hoehndorf, Robert
Gkoutos, Georgios V. - Abstract:
- Abstract: Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semanticAbstract: Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes. Highlights: Semantic similarity is a powerful tool for gaining insight into biomedical data, but generally collapses relationships between complex entity descriptions into a single score, necessarily losing information. To solve this problem, we develop a method for splitting phenotype profiles into semantic categories, to facilitate the availability of different features by which profiles can be compared with semantic similarity. We evaluate this approach by performing semantic clustering on a sample of patients from MIMIC-III, comparing overall and faceted partitions. We also develop and present a novel method for identifying explanatory variables for semantic clusters. Using this method, we show that faceted semantic clustering facilitates recovery of clinically meaningful relationships between entities from text-derived phenotypes. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 138(2021)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 138(2021)
- Issue Display:
- Volume 138, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 138
- Issue:
- 2021
- Issue Sort Value:
- 2021-0138-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Ontology -- Clustering -- MIMIC-III -- Semantic similarity -- Cluster explanation
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2021.104904 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19801.xml