A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection. (November 2021)
- Record Type:
- Journal Article
- Title:
- A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection. (November 2021)
- Main Title:
- A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection
- Authors:
- Hussain, Saddam
Mustafa, Mohd. Wazir
Jumani, Touqeer A.
Baloch, Shadi Khan
Alotaibi, Hammad
Khan, Ilyas
Khan, Afrasyab - Abstract:
- Abstract: This paper presents a novel supervised machine learning-based electric theft detection approach using the feature engineered-CatBoost algorithm in conjunction with the SMOTETomek algorithm. Contrary to the previous literature, where the missing observations in data are either ignored or imputed with average values, this work utilizes k-Nearest neighbor technique for missing data imputation; thus, an accurate and realistic estimation of the missing data is achieved. To mitigate the biasness to the majority data class, the proposed model utilizes the SMOTETomek algorithm, which neutralizes the mentioned effect by managing a proper balance between over-sampling and under-sampling techniques. Feature Extraction and Scalable Hypothesis (FRESH) algorithm is utilized at the later stage of the proposed NTL detection framework to extract and select the most relevant data features from the provided dataset. Afterward, the model is trained using the CatBoost algorithm to classify the consumers into two distinct categories, i.e., genuine and theft. Finally, to interpret the model's decision for the corresponding predictions, the tree-SHAP algorithm is utilized. To validate the efficacy of the proposed ML based theft detection approach, its performance is compared with that of the traditional gradient boosting ML algorithms such as XGBoost, lightGBM, Ensemble bagging, boosting ML models, and other conventional ML models using five of the most widely used performance metrics,Abstract: This paper presents a novel supervised machine learning-based electric theft detection approach using the feature engineered-CatBoost algorithm in conjunction with the SMOTETomek algorithm. Contrary to the previous literature, where the missing observations in data are either ignored or imputed with average values, this work utilizes k-Nearest neighbor technique for missing data imputation; thus, an accurate and realistic estimation of the missing data is achieved. To mitigate the biasness to the majority data class, the proposed model utilizes the SMOTETomek algorithm, which neutralizes the mentioned effect by managing a proper balance between over-sampling and under-sampling techniques. Feature Extraction and Scalable Hypothesis (FRESH) algorithm is utilized at the later stage of the proposed NTL detection framework to extract and select the most relevant data features from the provided dataset. Afterward, the model is trained using the CatBoost algorithm to classify the consumers into two distinct categories, i.e., genuine and theft. Finally, to interpret the model's decision for the corresponding predictions, the tree-SHAP algorithm is utilized. To validate the efficacy of the proposed ML based theft detection approach, its performance is compared with that of the traditional gradient boosting ML algorithms such as XGBoost, lightGBM, Ensemble bagging, boosting ML models, and other conventional ML models using five of the most widely used performance metrics, i.e., precision, accuracy, F1score Kappa and MCC. The proposed technique achieved an accuracy of 93% and a detection rate of 92%, which is significantly higher than all the considered competing algorithms under identical dataset and hyperparameters. … (more)
- Is Part Of:
- Energy reports. Volume 7(2021)
- Journal:
- Energy reports
- Issue:
- Volume 7(2021)
- Issue Display:
- Volume 7, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 7
- Issue:
- 2021
- Issue Sort Value:
- 2021-0007-2021-0000
- Page Start:
- 4425
- Page End:
- 4436
- Publication Date:
- 2021-11
- Subjects:
- CatBoost algorithm -- NTL detection -- Smart meters -- Feature engineering -- Machine learning model interpretation
Power resources -- Periodicals
Energy industries -- Periodicals
Power resources
Periodicals
Electronic journals
621.04205 - Journal URLs:
- http://www.sciencedirect.com/science/journal/23524847/ ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.egyr.2021.07.008 ↗
- Languages:
- English
- ISSNs:
- 2352-4847
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20284.xml