Improving partial mutual information-based input variable selection by consideration of boundary issues associated with bandwidth estimation. (September 2015)
- Record Type:
- Journal Article
- Title:
- Improving partial mutual information-based input variable selection by consideration of boundary issues associated with bandwidth estimation. (September 2015)
- Main Title:
- Improving partial mutual information-based input variable selection by consideration of boundary issues associated with bandwidth estimation
- Authors:
- Li, Xuyuan
Zecchin, Aaron C.
Maier, Holger R. - Abstract:
- Abstract: Input variable selection (IVS) is vital in the development of data-driven models. Among different IVS methods, partial mutual information (PMI) has shown significant promise, although its performance has been found to deteriorate for non-Gaussian and non-linear data. In this paper, the effectiveness of different approaches to improving PMI performance is investigated, focussing on boundary issues associated with bandwidth estimation. Boundary issues, associated with kernel-based density and residual computations within PMI, arise from the extension of symmetrical kernels beyond the feasible bounds of potential inputs, and result in an underestimation of kernel-based marginal and joint probability distribution functions in the PMI. In total, the effectiveness of 16 different approaches is tested on synthetically generated data and the results are used to develop preliminary guidelines for PMI IVS. By using the proposed guidelines, the correct inputs can be identified in 100% of trials, even if the data are highly non-linear or non-Gaussian. Highlights: We address the important problem of the performance of the PMI IVS influenced by boundary and bandwidth issues. We develop approaches to improve the performance of the PMI IVS for non-Gaussian and non-linear problems. Boundary resistant methods exhibit greater success than methods focussed on boundary correction. The performance (selection accuracy) of PMI IVS is improved when accounting for boundary issues.Abstract: Input variable selection (IVS) is vital in the development of data-driven models. Among different IVS methods, partial mutual information (PMI) has shown significant promise, although its performance has been found to deteriorate for non-Gaussian and non-linear data. In this paper, the effectiveness of different approaches to improving PMI performance is investigated, focussing on boundary issues associated with bandwidth estimation. Boundary issues, associated with kernel-based density and residual computations within PMI, arise from the extension of symmetrical kernels beyond the feasible bounds of potential inputs, and result in an underestimation of kernel-based marginal and joint probability distribution functions in the PMI. In total, the effectiveness of 16 different approaches is tested on synthetically generated data and the results are used to develop preliminary guidelines for PMI IVS. By using the proposed guidelines, the correct inputs can be identified in 100% of trials, even if the data are highly non-linear or non-Gaussian. Highlights: We address the important problem of the performance of the PMI IVS influenced by boundary and bandwidth issues. We develop approaches to improve the performance of the PMI IVS for non-Gaussian and non-linear problems. Boundary resistant methods exhibit greater success than methods focussed on boundary correction. The performance (selection accuracy) of PMI IVS is improved when accounting for boundary issues. Preliminary guidelines of bandwidth selection are developed for PMI IVS and successfully validated on two semi-real studies. … (more)
- Is Part Of:
- Environmental modelling & software. Volume 71(2015:Sep.)
- Journal:
- Environmental modelling & software
- Issue:
- Volume 71(2015:Sep.)
- Issue Display:
- Volume 71 (2015)
- Year:
- 2015
- Volume:
- 71
- Issue Sort Value:
- 2015-0071-0000-0000
- Page Start:
- 78
- Page End:
- 96
- Publication Date:
- 2015-09
- Subjects:
- Artificial neural networks -- Data-driven models -- Partial mutual information -- Kernel density estimation -- Kernel bandwidth -- Boundary issues -- Hydrology and water resources -- Input variable selection
Environmental monitoring -- Computer programs -- Periodicals
Ecology -- Computer simulation -- Periodicals
Digital computer simulation -- Periodicals
Computer software -- Periodicals
Environmental Monitoring -- Periodicals
Computer Simulation -- Periodicals
Environnement -- Surveillance -- Logiciels -- Périodiques
Écologie -- Simulation, Méthodes de -- Périodiques
Simulation par ordinateur -- Périodiques
Logiciels -- Périodiques
Computer software
Digital computer simulation
Ecology -- Computer simulation
Environmental monitoring -- Computer programs
Periodicals
Electronic journals
363.70015118 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13648152 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.envsoft.2015.05.013 ↗
- Languages:
- English
- ISSNs:
- 1364-8152
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3791.522800
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8044.xml