Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. (October 2020)
- Record Type:
- Journal Article
- Title:
- Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. (October 2020)
- Main Title:
- Inter-dataset generalization strength of supervised machine learning methods for intrusion detection
- Authors:
- D'hooge, Laurens
Wauters, Tim
Volckaert, Bruno
De Turck, Filip - Abstract:
- Abstract: This article describes an experimental investigation into the inter-dataset generalization of supervised machine learning methods, trained to distinguish between benign and several classes of malicious network flows. The first part details the process and results of establishing reference classification scores on CIC-IDS2017 and CSE-CIC-IDS2018, two modern, labeled data sets for testing intrusion detection systems. The data sets are divided into several days each pertaining to different attack classes (DoS, DDoS, infiltration, botnet, etc.). A pipeline has been created that includes twelve supervised learning algorithms from different families. Subsequently to this comparative analysis the DoS / SSL and botnet attack classes, which are represented in both data sets and are well-classified by many algorithms, have been selected to test the inter-dataset generalization strength of the trained models. Exposure of these models to unseen, but related samples without additional training was expected to maintain high classification performance, but this assumption is shown to be erroneous (at least for the tested attack classes). To our knowledge, there is no prior literature that validates the efficacy of supervised ML-based intrusion detection systems outside of the dataset(s) on which they have been trained. Our first results question the implied link that great intra-dataset generalization leads to great inter- or extra-dataset generalization. Further experimentationAbstract: This article describes an experimental investigation into the inter-dataset generalization of supervised machine learning methods, trained to distinguish between benign and several classes of malicious network flows. The first part details the process and results of establishing reference classification scores on CIC-IDS2017 and CSE-CIC-IDS2018, two modern, labeled data sets for testing intrusion detection systems. The data sets are divided into several days each pertaining to different attack classes (DoS, DDoS, infiltration, botnet, etc.). A pipeline has been created that includes twelve supervised learning algorithms from different families. Subsequently to this comparative analysis the DoS / SSL and botnet attack classes, which are represented in both data sets and are well-classified by many algorithms, have been selected to test the inter-dataset generalization strength of the trained models. Exposure of these models to unseen, but related samples without additional training was expected to maintain high classification performance, but this assumption is shown to be erroneous (at least for the tested attack classes). To our knowledge, there is no prior literature that validates the efficacy of supervised ML-based intrusion detection systems outside of the dataset(s) on which they have been trained. Our first results question the implied link that great intra-dataset generalization leads to great inter- or extra-dataset generalization. Further experimentation is required to discover the scope and causes of this deficiency as well as potential solutions. … (more)
- Is Part Of:
- Journal of information security and applications. Volume 54(2020)
- Journal:
- Journal of information security and applications
- Issue:
- Volume 54(2020)
- Issue Display:
- Volume 54, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 54
- Issue:
- 2020
- Issue Sort Value:
- 2020-0054-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-10
- Subjects:
- Binary classification -- CIC-IDS2017 -- CSE-CIC-IDS2018 -- Generalization strength -- Intrusion detection -- Supervised machine learning
Computer security -- Periodicals
Information technology -- Security measures -- Periodicals
005.805 - Journal URLs:
- http://www.sciencedirect.com/ ↗
- DOI:
- 10.1016/j.jisa.2020.102564 ↗
- Languages:
- English
- ISSNs:
- 2214-2126
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22441.xml