Simulated annealing aided genetic algorithm for gene selection from microarray data. (May 2023)
- Record Type:
- Journal Article
- Title:
- Simulated annealing aided genetic algorithm for gene selection from microarray data. (May 2023)
- Main Title:
- Simulated annealing aided genetic algorithm for gene selection from microarray data
- Authors:
- Marjit, Shyam
Bhattacharyya, Trinav
Chatterjee, Bitanu
Sarkar, Ram - Abstract:
- Abstract: In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA .Abstract: In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA . Highlights: Application of Simulated Annealing aided Genetic Algorithm to solve the FS problem. Introduced a new multi-objective fitness function to evaluate a feature subset. Proposal of a new acceptance probability function in SA and enhancements in GA. Use of initial feature dropping using MICC for microarray and omics datasets. Clustering-based population initialization to avoid premature convergence of SAGA. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 158(2023)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 158(2023)
- Issue Display:
- Volume 158, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 158
- Issue:
- 2023
- Issue Sort Value:
- 2023-0158-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-05
- Subjects:
- Feature selection -- Genetic algorithm -- Simulated annealing -- Optimization algorithm -- Gene expression -- Microarray dataset
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2023.106854 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26899.xml