A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. (15th July 2017)
- Record Type:
- Journal Article
- Title:
- A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. (15th July 2017)
- Main Title:
- A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring
- Authors:
- Xia, Yufei
Liu, Chuanzhe
Li, YuYing
Liu, Nana - Abstract:
- Highlights: A novel boosted tree model for credit scoring is proposed. A hyper-parameter optimization technique is developed based on TPE algorithm. The model is proved to outperform several baseline techniques. The model is validated on five datasets over five performance metrics. The feature importance scores and decision chart enhance model interpretation. Abstract: Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models; (2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel); and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importanceHighlights: A novel boosted tree model for credit scoring is proposed. A hyper-parameter optimization technique is developed based on TPE algorithm. The model is proved to outperform several baseline techniques. The model is validated on five datasets over five performance metrics. The feature importance scores and decision chart enhance model interpretation. Abstract: Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models; (2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel); and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importance scores is utilized to remove redundant variables. Third, the hyper-parameters of XGBoost are adaptively tuned with Bayesian hyper-parameter optimization and used to train the model with selected feature subset. Several hyper-parameter optimization methods and baseline classifiers are considered as reference points in the experiment. Results demonstrate that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search. Moreover, the proposed model outperforms baseline models on average over four evaluation measures: accuracy, error rate, the area under the curve (AUC) H measure (AUC-H measure), and Brier score. The proposed model also provides feature importance scores and decision chart, which enhance the interpretability of credit scoring model. … (more)
- Is Part Of:
- Expert systems with applications. Volume 78(2017)
- Journal:
- Expert systems with applications
- Issue:
- Volume 78(2017)
- Issue Display:
- Volume 78, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 78
- Issue:
- 2017
- Issue Sort Value:
- 2017-0078-2017-0000
- Page Start:
- 225
- Page End:
- 241
- Publication Date:
- 2017-07-15
- Subjects:
- Credit scoring -- Boosted decision tree -- Bayesian hyper-parameter optimization
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2017.02.017 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2757.xml