Identification of malware families using stacking of textural features and machine learning. (1st December 2022)
- Record Type:
- Journal Article
- Title:
- Identification of malware families using stacking of textural features and machine learning. (1st December 2022)
- Main Title:
- Identification of malware families using stacking of textural features and machine learning
- Authors:
- Kumar, Sanjeev
Janet, B.
Neelakantan, Subramanian - Abstract:
- Abstract: The growing rate of malware and its complexity demands a new approach to detecting evolving malware instead of relying only on high-level features such as opcodes, API calls, control flow graphs, etc. Moreover, extracting these features is an expensive and time-consuming because it requires disassembling or code execution. This paper presents a new malware detection architecture using image analysis & machine learning, requiring significantly fewer resources and does not depend upon code disassembling or execution. The collected binary samples are systematically labeled using the AVClass tool and clustering method based on similar characteristics of them. The labeled malicious program is visualized into grayscale images to extract the local and global textural features. The local textural features are extracted using SIFT, KAZE, and ORB descriptors, and global features are extracted using GIST, Hu Moments, and HOG. A bag of visual words (BoVW) algorithm is designed to select low-dimensional features and construct a local feature map of malware grayscale images. The feature maps from different image descriptors are stacked and used to train five machine learning algorithms, namely, k-Nearest Neighbor (k-NN), Support Vector Machine(SVM), Random Forest (RF), Naive Bayes(NB), and ExtraTree classifier. Two datasets are used for evaluations — the public MalImg dataset of 9339 samples of 25 different families and 690 real-world malware of 22 families collected onAbstract: The growing rate of malware and its complexity demands a new approach to detecting evolving malware instead of relying only on high-level features such as opcodes, API calls, control flow graphs, etc. Moreover, extracting these features is an expensive and time-consuming because it requires disassembling or code execution. This paper presents a new malware detection architecture using image analysis & machine learning, requiring significantly fewer resources and does not depend upon code disassembling or execution. The collected binary samples are systematically labeled using the AVClass tool and clustering method based on similar characteristics of them. The labeled malicious program is visualized into grayscale images to extract the local and global textural features. The local textural features are extracted using SIFT, KAZE, and ORB descriptors, and global features are extracted using GIST, Hu Moments, and HOG. A bag of visual words (BoVW) algorithm is designed to select low-dimensional features and construct a local feature map of malware grayscale images. The feature maps from different image descriptors are stacked and used to train five machine learning algorithms, namely, k-Nearest Neighbor (k-NN), Support Vector Machine(SVM), Random Forest (RF), Naive Bayes(NB), and ExtraTree classifier. Two datasets are used for evaluations — the public MalImg dataset of 9339 samples of 25 different families and 690 real-world malware of 22 families collected on honeypots in the wild. Intensive experiments are performed for image descriptors, image ratios, vocabulary size, and computational time as a mean time to detection(MTTD) to devise the best detector. The proposed method obtained test accuracy of 98.34% with stacked global features and 98.23% with stacked local features. Test accuracy of 92.75% with low false-positive rates is obtained for real-world recent malware datasets. Experiment results reveal the efficacy of the proposed method in detecting polymorphic obfuscated malware. Finally, a comparison with other similar malware detection systems is presented. Highlights: A new architecture of malware detection and classification using machine learning. Novel use of hybrid textural features and bag of visual words. A single image descriptor has difficulty capturing complex patterns of malware. … (more)
- Is Part Of:
- Expert systems with applications. Volume 208(2022)
- Journal:
- Expert systems with applications
- Issue:
- Volume 208(2022)
- Issue Display:
- Volume 208, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 208
- Issue:
- 2022
- Issue Sort Value:
- 2022-0208-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-12-01
- Subjects:
- Malware detection -- Bag of visual words -- Machine learning -- Cyber Security -- Image descriptors
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2022.118073 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23331.xml