Enhanced financial fraud detection using cost‐sensitive cascade forest with missing value imputation. Issue 3 (28th July 2022)
- Record Type:
- Journal Article
- Title:
- Enhanced financial fraud detection using cost‐sensitive cascade forest with missing value imputation. Issue 3 (28th July 2022)
- Main Title:
- Enhanced financial fraud detection using cost‐sensitive cascade forest with missing value imputation
- Authors:
- Huang, Lukui
Abrahams, Alan
Ractham, Peter - Abstract:
- Summary: Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data‐mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost‐sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self‐adjusts the depth of a cascade forest according to the classifier's recall (i.e. the classifier's sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K‐nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, andSummary: Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data‐mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost‐sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self‐adjusts the depth of a cascade forest according to the classifier's recall (i.e. the classifier's sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K‐nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model's performance for fraud detection. … (more)
- Is Part Of:
- Intelligent systems in accounting, finance and management. Volume 29:Issue 3(2022)
- Journal:
- Intelligent systems in accounting, finance and management
- Issue:
- Volume 29:Issue 3(2022)
- Issue Display:
- Volume 29, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 29
- Issue:
- 3
- Issue Sort Value:
- 2022-0029-0003-0000
- Page Start:
- 133
- Page End:
- 155
- Publication Date:
- 2022-07-28
- Subjects:
- cost‐sensitive learning -- deep forest -- financial fraud detection -- missing value imputation
Accounting -- Data processing -- Periodicals
Business -- Data processing -- Periodicals
Expert systems (Computer science) -- Periodicals
Artificial intelligence -- Periodicals
657.028563 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/isaf.1517 ↗
- Languages:
- English
- ISSNs:
- 1055-615X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4531.832101
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23219.xml