Scalable generalized median graph estimation and its manifold use in bioinformatics, clustering, classification, and indexing. Issue 100 (September 2021)
- Record Type:
- Journal Article
- Title:
- Scalable generalized median graph estimation and its manifold use in bioinformatics, clustering, classification, and indexing. Issue 100 (September 2021)
- Main Title:
- Scalable generalized median graph estimation and its manifold use in bioinformatics, clustering, classification, and indexing
- Authors:
- Blumenthal, David B.
Boria, Nicolas
Bougleux, Sébastien
Brun, Luc
Gamper, Johann
Gaüzère, Benoit - Abstract:
- Abstract: In this paper, we present GMG-BCU — a local search algorithm based on block coordinate update for estimating a generalized median graph for a given collection of labeled or unlabeled input graphs. Unlike all competitors, GMG-BCU is designed for both discrete and continuous label spaces and can be configured to run in linear time w. r. t. the size of the graph collection whenever median node and edge labels are computable in linear time. These properties make GMG-BCU usable for applications such as differential microbiome data analysis, graph classification, clustering, and indexing. We also prove theoretical properties of generalized median graphs, namely, that they exist under reasonable assumptions which are met in almost all application scenarios, that they are in general non-unique, that they are NP -hard to compute and APX -hard to approximate, and that no polynomial α -approximation exists for any α unless the graph isomorphism problem is in P . Extensive experiments on six different datasets show that our heuristic GMG-BCU always outperforms the state of the art in terms of runtime or quality (on most datasets, both w. r. t. runtime and quality), that it is the only available heuristic which can cope with collections containing several thousands of graphs, and that it shows very promising potential when used for the aforementioned applications. GMG-BCU is freely available on GitHub: https://github.com/dbblumenthal/gedlib/ .
- Is Part Of:
- Information systems. Issue 100(2021)
- Journal:
- Information systems
- Issue:
- Issue 100(2021)
- Issue Display:
- Volume 100, Issue 100 (2021)
- Year:
- 2021
- Volume:
- 100
- Issue:
- 100
- Issue Sort Value:
- 2021-0100-0100-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09
- Subjects:
- 68R01 -- 05C85 -- 90C59
Generalized median graphs -- Graph edit distance -- Graph similarity search -- Clustering -- Classification -- Indexing
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101766 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17090.xml