Boosting training for PDF malware classifier via active learning. Issue 4 (16th May 2021)

Record Type:: Journal Article
Title:: Boosting training for PDF malware classifier via active learning. Issue 4 (16th May 2021)
Main Title:: Boosting training for PDF malware classifier via active learning
Authors:: Li, Yuanzhang
Wang, Xinxin
Shi, Zhiwei
Zhang, Ruyun
Xue, Jingfeng
Wang, Zhi
Other Names:: Caraffini Fabio guestEditor.
Chiclana Francisco guestEditor.
Moodley Raymond guestEditor.
Gongora Mario guestEditor.
Abstract:: Abstract: Machine learning algorithms are widely used for cybersecurity applications, include spam, malware detection. In these applications, the machine learning model has to face attack by adversarial samples. Therefore, how to train a robust machine learning model with small samples is a very hot research problem. portable document format (PDF) is a widely used file format, and often utilized as a vehicle for malicious behavior. There have been various PDF malware detectors based on machine learning. However, the labeling of large‐scale data samples is time‐consuming and laborious. This paper aims to reduce the size of training set while maintain the performance of detection. We propose a novel PDF malware detection method, using active learning to boost training. Particularly, we first make clear the meaning of uncertain samples in this paper, and theoretically explain the effectiveness of these uncertain samples for malware detection. Second, we present an active‐learning based malware detection model, using mutual agreement analysis to choose the uncertain sample as the data augmentation. The detector is retrained according to the ground truth of the uncertain samples rather than the whole test samples in the previous epoch, which can not only improve the detection performance, but also reduce the training time consumption of the detector. We conduct 10 epochs of retraining experiments for comparison, using the uncertain samples and the whole test samples from the … (more)
Is Part Of:: International journal of intelligent systems. Volume 37:Issue 4(2022)
Journal:: International journal of intelligent systems
Issue:: Volume 37:Issue 4(2022)
Issue Display:: Volume 37, Issue 4 (2022)
Year:: 2022
Volume:: 37
Issue:: 4
Issue Sort Value:: 2022-0037-0004-0000
Page Start:: 2803
Page End:: 2821
Publication Date:: 2021-05-16
Subjects:: active learning -- machine learning -- malware detection -- PDF
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
006.3
Journal URLs:: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-111X ↗
https://www.hindawi.com/journals/ijis ↗
http://onlinelibrary.wiley.com/ ↗
DOI:: 10.1002/int.22451 ↗
Languages:: English
ISSNs:: 0884-8173
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4542.310500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 21151.xml