Multi-datasource machine learning in intrusion detection: Packet flows, system logs and host statistics. (August 2022)
- Record Type:
- Journal Article
- Title:
- Multi-datasource machine learning in intrusion detection: Packet flows, system logs and host statistics. (August 2022)
- Main Title:
- Multi-datasource machine learning in intrusion detection: Packet flows, system logs and host statistics
- Authors:
- Lin, Ying-Dar
Wang, Ze-Yu
Lin, Po-Ching
Nguyen, Van-Linh
Hwang, Ren-Hung
Lai, Yuan-Cheng - Abstract:
- Abstract: This work compares the performance of different combinations of data sources for intrusion detection in depth. To learn and distinguish between normal and malicious behavior, we use machine learning algorithms and train three typical models on three kinds of datasets: system logs, packet flows and host statistics. Unlike other studies, our study captures and monitors the behavior from multiple data sources in order to catch security attacks. Our aim is to figure out how to build the most effective dataset for machine learning with a combination of multiple sources. However, since there are no such datasets which have been generated from multiple sources for given attacks, we show how to build and generate a dataset with three data sources. We then compare the F1 score of the detection by applying machine learning algorithms for various combinations of the data sources. Our evaluation results show that the dataset of host statistics results in better performance (0.91) than traffic flows (0.63) and system logs (0.44) because it has the highest average F1-score in the three stages of attacks, while the other datasets may have poor F1-scores in some of the stages, particularly in the stage of impact. However, in the initial access stage of attacks, the dataset of logs performs the best (0.94), and the packet flows are suitable for detecting network DoS attacks (0.82). Furthermore, running this detection with all three data sources results in minor overheads of at mostAbstract: This work compares the performance of different combinations of data sources for intrusion detection in depth. To learn and distinguish between normal and malicious behavior, we use machine learning algorithms and train three typical models on three kinds of datasets: system logs, packet flows and host statistics. Unlike other studies, our study captures and monitors the behavior from multiple data sources in order to catch security attacks. Our aim is to figure out how to build the most effective dataset for machine learning with a combination of multiple sources. However, since there are no such datasets which have been generated from multiple sources for given attacks, we show how to build and generate a dataset with three data sources. We then compare the F1 score of the detection by applying machine learning algorithms for various combinations of the data sources. Our evaluation results show that the dataset of host statistics results in better performance (0.91) than traffic flows (0.63) and system logs (0.44) because it has the highest average F1-score in the three stages of attacks, while the other datasets may have poor F1-scores in some of the stages, particularly in the stage of impact. However, in the initial access stage of attacks, the dataset of logs performs the best (0.94), and the packet flows are suitable for detecting network DoS attacks (0.82). Furthermore, running this detection with all three data sources results in minor overheads of at most 2.1% CPU utilization. Finally, we analyze the important features of each model, such as the number of logs generated by apache-access, in.telnetd and postfix in the dataset of logs, SrcBytes and TotBytes in the dataset of flows, and MINFLT, VSTEXT and RSIZE in the dataset of statistics. … (more)
- Is Part Of:
- Journal of information security and applications. Volume 68(2022)
- Journal:
- Journal of information security and applications
- Issue:
- Volume 68(2022)
- Issue Display:
- Volume 68, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 68
- Issue:
- 2022
- Issue Sort Value:
- 2022-0068-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08
- Subjects:
- System log -- Traffic flow -- Host statistics -- Machine learning -- Feature engineering -- Intrusion detection
Computer security -- Periodicals
Information technology -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/ ↗
- DOI:
- 10.1016/j.jisa.2022.103248 ↗
- Languages:
- English
- ISSNs:
- 2214-2126
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22597.xml