Detecting code vulnerabilities by learning from large-scale open source repositories. (September 2022)
- Record Type:
- Journal Article
- Title:
- Detecting code vulnerabilities by learning from large-scale open source repositories. (September 2022)
- Main Title:
- Detecting code vulnerabilities by learning from large-scale open source repositories
- Authors:
- Xu, Rongze
Tang, Zhanyong
Ye, Guixin
Wang, Huanting
Ke, Xin
Fang, Dingyi
Wang, Zheng - Abstract:
- Abstract: Machine learning methods are widely used to identify common, repeatedly occurring bugs and code vulnerabilities. The performance of a machine-learned model is bounded by the quality and quantity of training data and the model's capability in extracting and capturing the essential information of the problem domain. Unfortunately, there is a storage of high-quality samples for training code vulnerability detection models, and existing machine learning methods are inadequate in capturing code vulnerability patterns. We present Developer, 1 a novel learning framework for building code vulnerability detection models. To address the data scarcity challenge, Developer automatically gathers training samples from open-source projects and applies constraints rules to the collected data to filter out noisy data to improve the quality of the collected samples. The collected data provides many real-world vulnerable code training samples to complement the samples available in standard vulnerable databases. To build an effective code vulnerability detection model, Developer employs a convolutional neural network architecture with attention mechanisms to extract code representation from the program abstract syntax tree. The extracted program representation is then fed to a downstream network – a bidirectional long–short term memory architecture – to predict if the target code contains a vulnerability or not. We apply Developer to identify vulnerabilities at the program source-codeAbstract: Machine learning methods are widely used to identify common, repeatedly occurring bugs and code vulnerabilities. The performance of a machine-learned model is bounded by the quality and quantity of training data and the model's capability in extracting and capturing the essential information of the problem domain. Unfortunately, there is a storage of high-quality samples for training code vulnerability detection models, and existing machine learning methods are inadequate in capturing code vulnerability patterns. We present Developer, 1 a novel learning framework for building code vulnerability detection models. To address the data scarcity challenge, Developer automatically gathers training samples from open-source projects and applies constraints rules to the collected data to filter out noisy data to improve the quality of the collected samples. The collected data provides many real-world vulnerable code training samples to complement the samples available in standard vulnerable databases. To build an effective code vulnerability detection model, Developer employs a convolutional neural network architecture with attention mechanisms to extract code representation from the program abstract syntax tree. The extracted program representation is then fed to a downstream network – a bidirectional long–short term memory architecture – to predict if the target code contains a vulnerability or not. We apply Developer to identify vulnerabilities at the program source-code level. Our evaluation shows that Developer outperforms state-of-the-art methods by uncovering more vulnerabilities with a lower false-positive rate. … (more)
- Is Part Of:
- Journal of information security and applications. Volume 69(2023)
- Journal:
- Journal of information security and applications
- Issue:
- Volume 69(2023)
- Issue Display:
- Volume 69, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 69
- Issue:
- 2023
- Issue Sort Value:
- 2023-0069-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Code vulnerability detection -- Deep learning -- Attention mechanism -- Software vulnerability
Computer security -- Periodicals
Information technology -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/ ↗
- DOI:
- 10.1016/j.jisa.2022.103293 ↗
- Languages:
- English
- ISSNs:
- 2214-2126
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23334.xml