Adjusting lexical features of actual proxy logs for intrusion detection. (February 2020)
- Record Type:
- Journal Article
- Title:
- Adjusting lexical features of actual proxy logs for intrusion detection. (February 2020)
- Main Title:
- Adjusting lexical features of actual proxy logs for intrusion detection
- Authors:
- Mimura, Mamoru
- Abstract:
- Abstract: Modern http-based malware imitates benign traffic to evade detection. To detect unseen malicious traffic, we proposed a linguistic-based detection method for proxy logs. This method extracts words as feature vectors automatically with natural language techniques, and discriminates between benign traffic and malicious traffic. The previous method generates a corpus from all the extracted words which contain trivial words. To generate discriminative feature representation, a corpus has to be effectively summarized. In actual proxy logs, benign traffic is dominant, and occupies malicious feature representation. Hence, the imbalance between benign and malicious traffic occurs. Moreover, a malicious paragraph might be mixed with some benign proxy logs. Therefore, the previous method does not perform accuracy in practical environment. This paper demonstrates that our previous method is not effective in actual proxy logs because of the imbalance. To mitigate the imbalance, our method adjusts lexical features of actual proxy logs based on the word importance. Our method does not adjust the number of each class such as the traditional sampling techniques. We performed cross-validation and timeline analysis with captured pcap files from Exploit Kit and actual proxy logs. The experimental results show our method could detect unseen malicious traffic in actual proxy logs. Moreover, we examine the effectiveness of mixing benign logs in each proportion. The best F-measureAbstract: Modern http-based malware imitates benign traffic to evade detection. To detect unseen malicious traffic, we proposed a linguistic-based detection method for proxy logs. This method extracts words as feature vectors automatically with natural language techniques, and discriminates between benign traffic and malicious traffic. The previous method generates a corpus from all the extracted words which contain trivial words. To generate discriminative feature representation, a corpus has to be effectively summarized. In actual proxy logs, benign traffic is dominant, and occupies malicious feature representation. Hence, the imbalance between benign and malicious traffic occurs. Moreover, a malicious paragraph might be mixed with some benign proxy logs. Therefore, the previous method does not perform accuracy in practical environment. This paper demonstrates that our previous method is not effective in actual proxy logs because of the imbalance. To mitigate the imbalance, our method adjusts lexical features of actual proxy logs based on the word importance. Our method does not adjust the number of each class such as the traditional sampling techniques. We performed cross-validation and timeline analysis with captured pcap files from Exploit Kit and actual proxy logs. The experimental results show our method could detect unseen malicious traffic in actual proxy logs. Moreover, we examine the effectiveness of mixing benign logs in each proportion. The best F-measure achieves 0.95 in the timeline analysis. … (more)
- Is Part Of:
- Journal of information security and applications. Volume 50(2020)
- Journal:
- Journal of information security and applications
- Issue:
- Volume 50(2020)
- Issue Display:
- Volume 50, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 50
- Issue:
- 2020
- Issue Sort Value:
- 2020-0050-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-02
- Subjects:
- Intrusion detection -- Machine learning -- TF-IDF -- Paragraph vector -- Doc2vec
Computer security -- Periodicals
Information technology -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/ ↗
- DOI:
- 10.1016/j.jisa.2019.102408 ↗
- Languages:
- English
- ISSNs:
- 2214-2126
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 12519.xml