CREME: A toolchain of automatic dataset collection for machine learning in intrusion detection. (1st November 2021)
- Record Type:
- Journal Article
- Title:
- CREME: A toolchain of automatic dataset collection for machine learning in intrusion detection. (1st November 2021)
- Main Title:
- CREME: A toolchain of automatic dataset collection for machine learning in intrusion detection
- Authors:
- Bui, Huu-Khoi
Lin, Ying-Dar
Hwang, Ren-Hung
Lin, Po-Ching
Nguyen, Van-Linh
Lai, Yuan-Cheng - Abstract:
- Abstract: Intrusion detection is one of the most common approaches for addressing security attacks in modern networks. However, given the increasing diversity of attack behaviors, efficient detection becomes more challenging. Machine learning (ML) has recently dominated as one of the most promising techniques to improve detection accuracy for intrusion detection systems(IDS). With ML-based approaches, a quality dataset for training holds the key to gain high detection performance. Unfortunately, there are few methods to assess the dataset quality, and specifically for ML training. This work presents an automated toolchain, termed CREME (C onfiguration, RE production, M ulti-dataset, and E valuation), to generate a dataset and measure its quality and efficiency. CREME integrates various tools to automate all stages of configuration, attack and benign behavior reproduction, data collection, feature extraction, data labeling, and evaluation. CREME can also automatically collect and generate a dataset from multiple sources such as accounting, network traffic, and system logs. Compared with the available datasets in the same category, experiment results show that the datasets generated by CREME contribute up to 20% better performance to ML-based IDS in terms of coverage. They also have significantly better efficiency than most other datasets. The CREME source code is available at https://github.com/buihuukhoi/CREME . Highlights: An open-source automated framework for collectingAbstract: Intrusion detection is one of the most common approaches for addressing security attacks in modern networks. However, given the increasing diversity of attack behaviors, efficient detection becomes more challenging. Machine learning (ML) has recently dominated as one of the most promising techniques to improve detection accuracy for intrusion detection systems(IDS). With ML-based approaches, a quality dataset for training holds the key to gain high detection performance. Unfortunately, there are few methods to assess the dataset quality, and specifically for ML training. This work presents an automated toolchain, termed CREME (C onfiguration, RE production, M ulti-dataset, and E valuation), to generate a dataset and measure its quality and efficiency. CREME integrates various tools to automate all stages of configuration, attack and benign behavior reproduction, data collection, feature extraction, data labeling, and evaluation. CREME can also automatically collect and generate a dataset from multiple sources such as accounting, network traffic, and system logs. Compared with the available datasets in the same category, experiment results show that the datasets generated by CREME contribute up to 20% better performance to ML-based IDS in terms of coverage. They also have significantly better efficiency than most other datasets. The CREME source code is available at https://github.com/buihuukhoi/CREME . Highlights: An open-source automated framework for collecting multiple sources datasets. Generated dataset provides better coverage and efficiency. Generated dataset significantly enriches data for causality-inspired Machine Learning/Deep Learning-based IDS research. … (more)
- Is Part Of:
- Journal of network and computer applications. Volume 193(2021)
- Journal:
- Journal of network and computer applications
- Issue:
- Volume 193(2021)
- Issue Display:
- Volume 193, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 193
- Issue:
- 2021
- Issue Sort Value:
- 2021-0193-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11-01
- Subjects:
- Dataset toolchain -- Intrusion detection -- Machine learning -- Security dataset -- Dataset generation -- Dataset evaluation -- Multiple data sources
Microcomputers -- Periodicals
Computer networks -- Periodicals
Application software -- Periodicals
Micro-ordinateurs -- Périodiques
Réseaux d'ordinateurs -- Périodiques
Logiciels d'application -- Périodiques
Application software
Computer networks
Microcomputers
Periodicals
004.05
004 - Journal URLs:
- http://www.sciencedirect.com/science/journal/10848045 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.jnca.2021.103212 ↗
- Languages:
- English
- ISSNs:
- 1084-8045
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5021.410600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19698.xml