SELF: A method of searching for library functions in stripped binary code. Issue 111 (December 2021)
- Record Type:
- Journal Article
- Title:
- SELF: A method of searching for library functions in stripped binary code. Issue 111 (December 2021)
- Main Title:
- SELF: A method of searching for library functions in stripped binary code
- Authors:
- Liu, Xueqian
Cao, Shoufeng
Cao, Zhenzhong
Gao, Qu
Wan, Lin
Wang, Fengyu - Abstract:
- Highlights: A novel scheme named SELF was proposed to recognize library functions used in stripped binary code. The key of SELF is combining the co-occurrence matrix and CAE to extract the semantic features of functions, which are resistant to version diversity. We crawled a large amount of software and libraries and constructed a binary function dataset for training and evaluation. Debugging symbolic information is retained when compiling these open-source codes, which are finally utilized to verify the recognition results. We implemented a prototype of SELF and determined its parameters based on the collected dataset. Through an experimental evaluation, we confirmed SELF's significant advantage over classic BINDIFF in library function recognition, especially when the version gap is large. Abstract: During software development, numerous third-party library functions are often reused. Accurately recognizing library functions reused in software is of great significance for some security scenarios, such as the detection of known vulnerabilities and reverse analyses of malware. An optional method for recognizing library functions is matching the functions in the library to those in the target software. However, due to the diversity of function library versions, compilers, build options, etc., there are differences between the two corresponding functions. Recognizing library functions used in target software precisely is still a challenging task. In this paper, we propose aHighlights: A novel scheme named SELF was proposed to recognize library functions used in stripped binary code. The key of SELF is combining the co-occurrence matrix and CAE to extract the semantic features of functions, which are resistant to version diversity. We crawled a large amount of software and libraries and constructed a binary function dataset for training and evaluation. Debugging symbolic information is retained when compiling these open-source codes, which are finally utilized to verify the recognition results. We implemented a prototype of SELF and determined its parameters based on the collected dataset. Through an experimental evaluation, we confirmed SELF's significant advantage over classic BINDIFF in library function recognition, especially when the version gap is large. Abstract: During software development, numerous third-party library functions are often reused. Accurately recognizing library functions reused in software is of great significance for some security scenarios, such as the detection of known vulnerabilities and reverse analyses of malware. An optional method for recognizing library functions is matching the functions in the library to those in the target software. However, due to the diversity of function library versions, compilers, build options, etc., there are differences between the two corresponding functions. Recognizing library functions used in target software precisely is still a challenging task. In this paper, we propose a novel method named SELF (SEarch for Library Functions) to recognize library functions used in target software. In SELF, the function is represented with a co-occurrence matrix and encoded by a convolutional auto-encoder (CAE). Then, the similarity between two functions is detected using the generated bottleneck features. This scheme focuses on the discriminative semantic features; thus, this method can not only distinguish different functions but also tolerate the subtle differences between two pairing functions, which is specifically required for library function recognition. We collected 451 software projects, including approximately 3 million functions, to train and evaluate SELF. The experimental results show that SELF performs well in both Recall@1 and Recall@5. Especially when the library version gap is large, SELF significantly outperforms classic BINDIFF. In addition, SELF shows good computational efficiency. … (more)
- Is Part Of:
- Computers & security. Issue 111(2021)
- Journal:
- Computers & security
- Issue:
- Issue 111(2021)
- Issue Display:
- Volume 111, Issue 111 (2021)
- Year:
- 2021
- Volume:
- 111
- Issue:
- 111
- Issue Sort Value:
- 2021-0111-0111-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12
- Subjects:
- Software reverse engineering -- Binary code analysis -- Library function -- Semantic feature -- Bigram -- Co-occurrence matrix -- Convolutional autoencoder
Computer security -- Periodicals
Electronic data processing departments -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01674048 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cose.2021.102473 ↗
- Languages:
- English
- ISSNs:
- 0167-4048
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.781000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19827.xml