Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data. Issue 1 (2nd January 2019)
- Record Type:
- Journal Article
- Title:
- Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data. Issue 1 (2nd January 2019)
- Main Title:
- Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data
- Authors:
- Godichon-Baggioni, Antoine
Maugis-Rabusseau, Cathy
Rau, Andrea - Abstract:
- ABSTRACT: Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e. data whose rows belong to the simplex) remains largely unexplored in cases where the observed value is equal or close to zero for one or more samples. This work is motivated by the analysis of two applications, both focused on the categorization of compositional profiles: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K -means algorithm. We use a non-asymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor packagecoseq, on both the gene expression and bicycle sharing system data.
- Is Part Of:
- Journal of applied statistics. Volume 46:Issue 1(2019)
- Journal:
- Journal of applied statistics
- Issue:
- Volume 46:Issue 1(2019)
- Issue Display:
- Volume 46, Issue 1 (2019)
- Year:
- 2019
- Volume:
- 46
- Issue:
- 1
- Issue Sort Value:
- 2019-0046-0001-0000
- Page Start:
- 47
- Page End:
- 65
- Publication Date:
- 2019-01-02
- Subjects:
- Clustering -- compositional data -- data transformations -- K-means
Statistics -- Periodicals
519.5 - Journal URLs:
- http://www.tandfonline.com/loi/cjas20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/02664763.2018.1454894 ↗
- Languages:
- English
- ISSNs:
- 0266-4763
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4947.110000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8387.xml