BinDeep: A deep learning approach to binary code similarity detection. (15th April 2021)
- Record Type:
- Journal Article
- Title:
- BinDeep: A deep learning approach to binary code similarity detection. (15th April 2021)
- Main Title:
- BinDeep: A deep learning approach to binary code similarity detection
- Authors:
- Tian, Donghai
Jia, Xiaoqi
Ma, Rui
Liu, Shuke
Liu, Wenjing
Hu, Changzhen - Abstract:
- Highlights: We propose a novel deep learning based solution for binary code similarity detection. We use the instruction embedding model to vectorize the extracted instructions. We apply a deep learning classification model to identify the types of functions to be compared. We utilize the hybrid siamese neural network to measure the binary code similarity. We conduct extensive experiments to evaluate the effectiveness of our approach. Abstract: Binary code similarity detection (BCSD) plays an important role in malware analysis and vulnerability discovery. Existing methods mainly rely on the expert's knowledge for the BCSD, which may not be reliable in some cases. More importantly, the detection accuracy (or performance) of these methods are not so satisfied. To address these issues, we propose BinDeep, a deep learning approach for binary code similarity detection. This method firstly extracts the instruction sequence from the binary function and then uses the instruction embedding model to vectorize the instruction features. Next, BinDeep applies a Recurrent Neural Network (RNN) deep learning model to identify the specific types of two functions for later comparison. According to the type information, BinDeep selects the corresponding deep learning model for similarity comparison. Specifically, BinDeep uses the Siamese neural networks, which combine the LSTM and CNN to measure the similarities of two target functions. Different from the traditional deep learning model, ourHighlights: We propose a novel deep learning based solution for binary code similarity detection. We use the instruction embedding model to vectorize the extracted instructions. We apply a deep learning classification model to identify the types of functions to be compared. We utilize the hybrid siamese neural network to measure the binary code similarity. We conduct extensive experiments to evaluate the effectiveness of our approach. Abstract: Binary code similarity detection (BCSD) plays an important role in malware analysis and vulnerability discovery. Existing methods mainly rely on the expert's knowledge for the BCSD, which may not be reliable in some cases. More importantly, the detection accuracy (or performance) of these methods are not so satisfied. To address these issues, we propose BinDeep, a deep learning approach for binary code similarity detection. This method firstly extracts the instruction sequence from the binary function and then uses the instruction embedding model to vectorize the instruction features. Next, BinDeep applies a Recurrent Neural Network (RNN) deep learning model to identify the specific types of two functions for later comparison. According to the type information, BinDeep selects the corresponding deep learning model for similarity comparison. Specifically, BinDeep uses the Siamese neural networks, which combine the LSTM and CNN to measure the similarities of two target functions. Different from the traditional deep learning model, our hybrid model takes advantage of the CNN spatial structure learning and the LSTM sequence learning. The evaluation shows that our approach can achieve good BCSD between cross-architecture, cross-compiler, cross-optimization, and cross-version binary code. … (more)
- Is Part Of:
- Expert systems with applications. Volume 168(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 168(2021)
- Issue Display:
- Volume 168, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 168
- Issue:
- 2021
- Issue Sort Value:
- 2021-0168-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04-15
- Subjects:
- Binary code -- Deep learning -- Similarity comparison -- Siamese neural network -- LSTM -- CNN
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2020.114348 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15544.xml