A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets. (1st May 2019)
- Record Type:
- Journal Article
- Title:
- A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets. (1st May 2019)
- Main Title:
- A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets
- Authors:
- Sayed, Sabah
Nassef, Mohammad
Badr, Amr
Farag, Ibrahim - Abstract:
- Highlights: Feature selection over high-dimensional colon cancer Microarray Datasets. Features selected from both Gene Expression and DNA-Methylation Microarray datasets. Resultant six biomarker genes for colon cancer validated using Enrichment Analysis. Biomarker genes validated on independent datasets with 99.9% classification accuracy. Abstract: Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t -test and genetic algorithm. After preprocessing the data using t -test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategyHighlights: Feature selection over high-dimensional colon cancer Microarray Datasets. Features selected from both Gene Expression and DNA-Methylation Microarray datasets. Resultant six biomarker genes for colon cancer validated using Enrichment Analysis. Biomarker genes validated on independent datasets with 99.9% classification accuracy. Abstract: Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t -test and genetic algorithm. After preprocessing the data using t -test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classification accuracy. Consequently, the biological significance of the resulting optimal genes is validated using Enrichment Analysis. Moreover, the results of Nested-GA have been compared to the results of other feature selection algorithms that have been run on either Gene Expression or DNA Methylation datasets. From the experimental results, Nested-GA showed the highest classification performance with a small optimal feature subset compared to the other algorithms. Furthermore, by running Nested-GA on lung cancer datasets that contain two different cancer subtypes, it resulted in significantly better classification accuracy (98.4%) compared to the accuracy of a previous research (84.6%) that utilized lung cancer DNA-Methylation data only. … (more)
- Is Part Of:
- Expert systems with applications. Volume 121(2019)
- Journal:
- Expert systems with applications
- Issue:
- Volume 121(2019)
- Issue Display:
- Volume 121, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 121
- Issue:
- 2019
- Issue Sort Value:
- 2019-0121-2019-0000
- Page Start:
- 233
- Page End:
- 243
- Publication Date:
- 2019-05-01
- Subjects:
- Microarray gene expression -- DNA Methylation -- Colon cancer -- Lung cancer -- Machine learning -- Genetic algorithm -- Feature selection -- Support Vector Machine
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2018.12.022 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9402.xml