A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Issue 106 (July 2021)
- Record Type:
- Journal Article
- Title:
- A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Issue 106 (July 2021)
- Main Title:
- A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling
- Authors:
- Wang, Peng
Tang, Zhijie
Wang, Junfeng - Abstract:
- Abstract: New malware variants appear rapidly and continuously increase the difficulty to classify malware into correct families. This brings two challenges for malware classification: The first is the scarce samples problem, where collecting a large volume of a newly detected malware family to train a classifier can be extremely hard and it is unavoidable to suffer from overfitting using a small number of samples. The second is the dynamic recognition problem. Most widely adopted classifiers are trained on predefined known malware families, lacking ability to incrementally identifying novel families, which require to retrain from scratch. To tackle these challenges, in this study, we employ meta-learning based few-shot learning (FSL) technique and propose a new few-shot malware classification model called SIMPLE (Supervised Infinite Mixture Prototypes LEarning). With the help of meta-learning, SIMPLE is trained with predefined malware families and can maintain its ability to classify novel malware families that has never met. Furthermore, the prior knowledge learned via meta-learning can prevent from overfitting caused by scarce samples. Our proposed SIMPLE introduces multi-prototype modeling to generate multiple prototypes of each family to enhance the generalization ability, based on API invocation sequences from dynamic analysis. This is inspired by the observation that behaviors within the same family often match multiple subpatterns and satisfy multimodal dataAbstract: New malware variants appear rapidly and continuously increase the difficulty to classify malware into correct families. This brings two challenges for malware classification: The first is the scarce samples problem, where collecting a large volume of a newly detected malware family to train a classifier can be extremely hard and it is unavoidable to suffer from overfitting using a small number of samples. The second is the dynamic recognition problem. Most widely adopted classifiers are trained on predefined known malware families, lacking ability to incrementally identifying novel families, which require to retrain from scratch. To tackle these challenges, in this study, we employ meta-learning based few-shot learning (FSL) technique and propose a new few-shot malware classification model called SIMPLE (Supervised Infinite Mixture Prototypes LEarning). With the help of meta-learning, SIMPLE is trained with predefined malware families and can maintain its ability to classify novel malware families that has never met. Furthermore, the prior knowledge learned via meta-learning can prevent from overfitting caused by scarce samples. Our proposed SIMPLE introduces multi-prototype modeling to generate multiple prototypes of each family to enhance the generalization ability, based on API invocation sequences from dynamic analysis. This is inspired by the observation that behaviors within the same family often match multiple subpatterns and satisfy multimodal data distribution. In the broad experiments, SIMPLE achieves state-of-the-art few-shot malware classification performance and outperforms all the baselines. With only 5 samples per family, SIMPLE reaches very high accuracy of 90% in 5-way classification task on novel malware families, which substantially solves the problem of scarce samples and dynamic recognition. We also make analysis on the reason of effectiveness with multi-prototype and fast adaption feature to provide more interpretability for the results. … (more)
- Is Part Of:
- Computers & security. Issue 106(2021)
- Journal:
- Computers & security
- Issue:
- Issue 106(2021)
- Issue Display:
- Volume 106, Issue 106 (2021)
- Year:
- 2021
- Volume:
- 106
- Issue:
- 106
- Issue Sort Value:
- 2021-0106-0106-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Malware classification -- Few-shot learning -- Multimodal distribution -- Multi-prototype -- Infinite mixture prototypes
Computer security -- Periodicals
Electronic data processing departments -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01674048 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cose.2021.102273 ↗
- Languages:
- English
- ISSNs:
- 0167-4048
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.781000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17209.xml