SEQENS: An ensemble method for relevant gene identification in microarray data. (January 2023)
- Record Type:
- Journal Article
- Title:
- SEQENS: An ensemble method for relevant gene identification in microarray data. (January 2023)
- Main Title:
- SEQENS: An ensemble method for relevant gene identification in microarray data
- Authors:
- Signol, François
Arnal, Laura
Navarro-Cerdán, J. Ramón
Llobet, Rafael
Arlandis, Joaquim
Perez-Cortes, Juan-Carlos - Abstract:
- Abstract: This paper describes an ensemble feature identification algorithm called SEQENS, and measures its capability to identify the relevant variables in a case-control study using a genetic expression microarray dataset. SEQENS uses Sequential Feature Search on multiple sample splitting to select variables showing stronger relation with the target, and a variable relevance ranking is finally produced. Although designed for feature identification, SEQENS could also serve as a basis for feature selection (classifier optimisation). Cliff, a ranking evaluation metric is also presented and used to assess the feature identification algorithms when a groundtruth of relevant variables is available. To test performance, three types of synthetic groundtruths emulating fictitious diseases are generated from ten randomly chosen variables following different target pattern distributions using the E-MTAB-3732 dataset. Several sample-to-dimensionality ratios ranging from 300 to 3, 000 observations and 854 to 54, 675 variables are explored. SEQENS is compared with other feature selection or identification state-of-the-art methods. On average, the proposed algorithm identifies better the relevant genes and exhibits a stronger stability. The algorithm is available to the community. Graphical abstract: Highlights: New ensemble gene identification method applied to high-dimensional microarray data. Machine learning based method running feature selection on multiple data partitions.Abstract: This paper describes an ensemble feature identification algorithm called SEQENS, and measures its capability to identify the relevant variables in a case-control study using a genetic expression microarray dataset. SEQENS uses Sequential Feature Search on multiple sample splitting to select variables showing stronger relation with the target, and a variable relevance ranking is finally produced. Although designed for feature identification, SEQENS could also serve as a basis for feature selection (classifier optimisation). Cliff, a ranking evaluation metric is also presented and used to assess the feature identification algorithms when a groundtruth of relevant variables is available. To test performance, three types of synthetic groundtruths emulating fictitious diseases are generated from ten randomly chosen variables following different target pattern distributions using the E-MTAB-3732 dataset. Several sample-to-dimensionality ratios ranging from 300 to 3, 000 observations and 854 to 54, 675 variables are explored. SEQENS is compared with other feature selection or identification state-of-the-art methods. On average, the proposed algorithm identifies better the relevant genes and exhibits a stronger stability. The algorithm is available to the community. Graphical abstract: Highlights: New ensemble gene identification method applied to high-dimensional microarray data. Machine learning based method running feature selection on multiple data partitions. Comparative study of gene identification methods with a novel performance metric. Synthetic diseases created from interacting genes generate the groundtruth. The algorithm identifies better the relevant genes and offers a stronger stability. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 152(2023)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 152(2023)
- Issue Display:
- Volume 152, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 152
- Issue:
- 2023
- Issue Sort Value:
- 2023-0152-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Gene identification -- Feature selection -- Ensemble method -- Microarray data -- High dimensionality spaces
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.106413 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24845.xml