SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features. (March 2021)
- Record Type:
- Journal Article
- Title:
- SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features. (March 2021)
- Main Title:
- SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features
- Authors:
- García-Pedrajas, Nicolás
del Castillo, Juan A. Romero
Cerruela-García, Gonzalo - Abstract:
- Highlights: We propose a new simultaneous instance and feature selection algorithm. The method achieves better storage reduction and testing error than previous approaches. The method is scalable to datasets with millions of features. Abstract: Data reduction is becoming increasingly relevant due to the enormous amounts of data that are constantly being produced in many fields of research. Instance selection is one of the most widely used methods for this task. At the same time, most recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly hinders classification and recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is greatly improved when the complexity of the data is reduced. Thus, feature selection is also a widely used method for data reduction and for gaining an understanding of feature information. Although most methods address instance and feature selection separately, the two problems are interwoven, and benefits are expected from performing these two tasks jointly. However, few algorithms have been proposed for simultaneously addressing the tasks of instance and feature selection. Furthermore, most of those methods are based on complex heuristics that are very difficult to scale up even to moderately large datasets. This paper proposes a new algorithm for dealing with many instances andHighlights: We propose a new simultaneous instance and feature selection algorithm. The method achieves better storage reduction and testing error than previous approaches. The method is scalable to datasets with millions of features. Abstract: Data reduction is becoming increasingly relevant due to the enormous amounts of data that are constantly being produced in many fields of research. Instance selection is one of the most widely used methods for this task. At the same time, most recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly hinders classification and recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is greatly improved when the complexity of the data is reduced. Thus, feature selection is also a widely used method for data reduction and for gaining an understanding of feature information. Although most methods address instance and feature selection separately, the two problems are interwoven, and benefits are expected from performing these two tasks jointly. However, few algorithms have been proposed for simultaneously addressing the tasks of instance and feature selection. Furthermore, most of those methods are based on complex heuristics that are very difficult to scale up even to moderately large datasets. This paper proposes a new algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection using a simple heuristic search and several scaling-up mechanisms that can be successfully applied to datasets with millions of features and instances. In the proposed method, a forward selection search is performed in the feature space jointly with the application of standard instance selection in a constructive subspace built stepwise. Several simplifications are adopted in the search to obtain a scalable method. An extensive comparison using 95 large datasets shows the usefulness of our method and its ability to deal with millions of instances and features simultaneously. The method is able to obtain better classification performance results than state-of-the-art approaches while achieving considerable data reduction. … (more)
- Is Part Of:
- Pattern recognition. Volume 111(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 111(2021)
- Issue Display:
- Volume 111, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 111
- Issue:
- 2021
- Issue Sort Value:
- 2021-0111-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03
- Subjects:
- Instance selection -- Feature selection -- Evolutionary algorithms -- Knearest neighbor rule
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107723 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14921.xml