Reuse-centric k-means configuration. Issue 100 (September 2021)
- Record Type:
- Journal Article
- Title:
- Reuse-centric k-means configuration. Issue 100 (September 2021)
- Main Title:
- Reuse-centric k-means configuration
- Authors:
- Zhang, Lijun
Guan, Hui
Ding, Yufei
Shen, Xipeng
Krim, Hamid - Abstract:
- Abstract: K -means configuration is to find a configuration of k -means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k -means. This paper proposes reuse-centric k -means configuration to accelerate k -means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k -means–based data classification tasks show that reuse-centric k -means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential. Highlights: Computation reuse is useful to accelerate k -means configuration. The proposed reuse-centric techniques can accelerate k -means configuration by 5-9X. Reusing distances does not change k -means clustering results. Reusing cluster centers causes onlyAbstract: K -means configuration is to find a configuration of k -means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k -means. This paper proposes reuse-centric k -means configuration to accelerate k -means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k -means–based data classification tasks show that reuse-centric k -means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential. Highlights: Computation reuse is useful to accelerate k -means configuration. The proposed reuse-centric techniques can accelerate k -means configuration by 5-9X. Reusing distances does not change k -means clustering results. Reusing cluster centers causes only little disparity on the quality of k -means results. … (more)
- Is Part Of:
- Information systems. Issue 100(2021)
- Journal:
- Information systems
- Issue:
- Issue 100(2021)
- Issue Display:
- Volume 100, Issue 100 (2021)
- Year:
- 2021
- Volume:
- 100
- Issue:
- 100
- Issue Sort Value:
- 2021-0100-0100-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09
- Subjects:
- K-means -- Algorithm configuration -- Computation reuse
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101787 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17090.xml