Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement. Issue 3 (17th July 2020)

Record Type:: Journal Article
Title:: Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement. Issue 3 (17th July 2020)
Main Title:: Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement
Authors:: Ndichu, Samuel
Kim, Sangwook
Ozawa, Seiichi
Abstract:: Abstract : Obfuscation is rampant in both benign and malicious JavaScript (JS) codes. It generates an obscure and undetectable code that hinders comprehension and analysis. Therefore, accurate detection of JS codes that masquerade as innocuous scripts is vital. The existing deobfuscation methods assume that a specific tool can recover an original JS code entirely. For a multi‐layer obfuscation, general tools realize a formatted JS code, but some sections remain encoded. For the detection of such codes, this study performs Deobfuscation, Unpacking, and Decoding (DUD‐preprocessing) by function redefinition using a Virtual Machine (VM), a JS code editor, and a python int_to_str() function to facilitate feature learning by the FastText model. The learned feature vectors are passed to a classifier model that judges the maliciousness of a JS code. In performance evaluation, the authors use the Hynek Petrak's dataset for obfuscated malicious JS codes and the SRILAB dataset and the Majestic Million service top 10, 000 websites for obfuscated benign JS codes. They then compare the performance to other models on the detection of DUD‐preprocessed obfuscated malicious JS codes. Their experimental results show that the proposed approach enhances feature learning and provides improved accuracy in the detection of obfuscated malicious JS codes.
Is Part Of:: CAAI transactions on intelligence technology. Volume 5:Issue 3(2020)
Journal:: CAAI transactions on intelligence technology
Issue:: Volume 5:Issue 3(2020)
Issue Display:: Volume 5, Issue 3 (2020)
Year:: 2020
Volume:: 5
Issue:: 3
Issue Sort Value:: 2020-0005-0003-0000
Page Start:: 184
Page End:: 192
Publication Date:: 2020-07-17
Subjects:: invasive software -- Java -- Internet -- feature extraction -- text analysis -- vectors -- learning (artificial intelligence)
formatted JS code -- deobfuscation methods -- unpacking -- DUD‐preprocessed obfuscated malicious JS codes -- term frequency–inverse document frequency model -- long short‐term memory model -- paragraph vector models -- obfuscated benign JS codes -- learned feature vectors -- FastText model -- JS code editor -- multilayer obfuscation -- original JS code -- undetectable code -- obscure code -- machine learning models detection
Artificial intelligence -- Periodicals
Computer science -- Periodicals
Artificial intelligence
Computer science
Electronic journals
Periodicals
006.305
Journal URLs:: https://digital-library.theiet.org/content/journals/trit ↗
https://ietresearch.onlinelibrary.wiley.com/journal/24682322 ↗
http://search.ebscohost.com/login.aspx?direct=true&site=edspub-live&scope=site&type=44&db=edspub&authtype=ip, guest&custid=ns011247&groupid=main&profile=eds&bquery=AN%2010129651 ↗
http://www.sciencedirect.com/ ↗
http://www.sciencedirect.com/ ↗
DOI:: 10.1049/trit.2020.0026 ↗
Languages:: English
ISSNs:: 2468-6557
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 2943.720000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 16699.xml