Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. (July 2019)
- Record Type:
- Journal Article
- Title:
- Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. (July 2019)
- Main Title:
- Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms
- Authors:
- Maniruzzaman, Md.
Jahanur Rahman, Md.
Ahammed, Benojir
Abedin, Md. Menhazul
Suri, Harman S.
Biswas, Mainak
El-Baz, Ayman
Bangeas, Petros
Tsoulfas, Georgios
Suri, Jasjit S. - Abstract:
- Highlights: Identification of high risk differential gene expression using statistical tests. Development of a machine learning strategy for predicting the cancerous genes. Four statistical tests and ten machine learning classifiers were experimentally preformed, validated and compared. Abstract: Objective: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. Methods: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal–Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). Results: The colon cancer datasetHighlights: Identification of high risk differential gene expression using statistical tests. Development of a machine learning strategy for predicting the cancerous genes. Four statistical tests and ten machine learning classifiers were experimentally preformed, validated and compared. Abstract: Objective: A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. Methods: Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal–Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). Results: The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was90.50 %. The ML system showed an ACC of99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of8% over previously published values in literature. Conclusions: RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 176(2019)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 176(2019)
- Issue Display:
- Volume 176, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 176
- Issue:
- 2019
- Issue Sort Value:
- 2019-0176-2019-0000
- Page Start:
- 173
- Page End:
- 193
- Publication Date:
- 2019-07
- Subjects:
- Colon cancer -- Gene expression data -- Prediction -- Statistical test -- Machine learning -- Performance
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2019.04.008 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10975.xml