Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data. (July 2022)
- Record Type:
- Journal Article
- Title:
- Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data. (July 2022)
- Main Title:
- Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data
- Authors:
- Xu, Junlin
Cui, Lingyu
Zhuang, Jujuan
Meng, Yajie
Bing, Pingping
He, Binsheng
Tian, Geng
Kwok Pui, Choi
Wu, Taoyang
Wang, Bing
Yang, Jialiang - Abstract:
- Abstract: Recent advances in single-cell RNA sequencing (scRNA-seq) provide exciting opportunities for transcriptome analysis at single-cell resolution. Clustering individual cells is a key step to reveal cell subtypes and infer cell lineage in scRNA-seq analysis. Although many dedicated algorithms have been proposed, clustering quality remains a computational challenge for scRNA-seq data, which is exacerbated by inflated zero counts due to various technical noise. To address this challenge, we assess the combinations of nine popular dropout imputation methods and eight clustering methods on a collection of 10 well-annotated scRNA-seq datasets with different sample sizes. Our results show that (i) imputation algorithms do typically improve the performance of clustering methods, and the quality of data visualization using t-Distributed Stochastic Neighbor Embedding; and (ii) the performance of a particular combination of imputation and clustering methods varies with dataset size. For example, the combination of single-cell analysis via expression recovery and Sparse Subspace Clustering (SSC) methods usually works well on smaller datasets, while the combination of adaptively-thresholded low-rank approximation and single-cell interpretation via multikernel learning (SIMLR) usually achieves the best performance on larger datasets. Graphical abstract: Image 1 Highlights: Assess the performance of nine popular imputation and eight clustering methods. The imputation algorithmsAbstract: Recent advances in single-cell RNA sequencing (scRNA-seq) provide exciting opportunities for transcriptome analysis at single-cell resolution. Clustering individual cells is a key step to reveal cell subtypes and infer cell lineage in scRNA-seq analysis. Although many dedicated algorithms have been proposed, clustering quality remains a computational challenge for scRNA-seq data, which is exacerbated by inflated zero counts due to various technical noise. To address this challenge, we assess the combinations of nine popular dropout imputation methods and eight clustering methods on a collection of 10 well-annotated scRNA-seq datasets with different sample sizes. Our results show that (i) imputation algorithms do typically improve the performance of clustering methods, and the quality of data visualization using t-Distributed Stochastic Neighbor Embedding; and (ii) the performance of a particular combination of imputation and clustering methods varies with dataset size. For example, the combination of single-cell analysis via expression recovery and Sparse Subspace Clustering (SSC) methods usually works well on smaller datasets, while the combination of adaptively-thresholded low-rank approximation and single-cell interpretation via multikernel learning (SIMLR) usually achieves the best performance on larger datasets. Graphical abstract: Image 1 Highlights: Assess the performance of nine popular imputation and eight clustering methods. The imputation algorithms improve the performance of clustering methods. Evaluate combinations of different methods on different-sized datasets. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 146(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 146(2022)
- Issue Display:
- Volume 146, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 146
- Issue:
- 2022
- Issue Sort Value:
- 2022-0146-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-07
- Subjects:
- Single-cell RNA sequencing -- Dropout imputation -- Cell clustering -- T-SNE -- Adjusted rand index
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105697 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22315.xml