Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features. Issue 2 (2nd April 2017)
- Record Type:
- Journal Article
- Title:
- Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features. Issue 2 (2nd April 2017)
- Main Title:
- Comparison of machine-learning algorithms for classification of VPN network traffic flow using time-related features
- Authors:
- Bagui, Sikha
Fang, Xingang
Kalaimannan, Ezhil
Bagui, Subhash C.
Sheehan, Joseph - Abstract:
- ABSTRACT: Network traffic classification and characterisation is playing an increasingly vital role in understanding and solving security-related issues in internet-based applications. The priority of research studies in this area has focused on characterisation of network traffic based on various layers of communication protocols as outlined in the TCP/IP stack and even further expanded to concentrate on specific application-layer protocols. Virtual Private Networks (VPNs) have become one of the most popular remote access communication methods among users over the public internet and other Internet Protocol (IP)-based networks. VPNs are governed by IP Security, which is a suite of protocols used for tunnelling the already encrypted IP traffic, to guarantee secure remote access to servers. In this paper, we propose and develop a framework to classify VPN or non-VPN network traffic using time-related features. Our focus is on classification of network traffic which is encrypted, tunnelled through a VPN, and the one which is normally encrypted (non-VPN transmission), using machine-learning techniques on data sets of time-related features. Six classification models: logistic regression, support vector machine, Naïve Bayes, k -nearest neighbour and ensemble methods – the Random Forest (RF) classifier and Gradient Boosting Tree (GBT) classifiers – are compared, and recommendations of optimised RF and GBT models over other models are provided in terms of high accuracy and lowABSTRACT: Network traffic classification and characterisation is playing an increasingly vital role in understanding and solving security-related issues in internet-based applications. The priority of research studies in this area has focused on characterisation of network traffic based on various layers of communication protocols as outlined in the TCP/IP stack and even further expanded to concentrate on specific application-layer protocols. Virtual Private Networks (VPNs) have become one of the most popular remote access communication methods among users over the public internet and other Internet Protocol (IP)-based networks. VPNs are governed by IP Security, which is a suite of protocols used for tunnelling the already encrypted IP traffic, to guarantee secure remote access to servers. In this paper, we propose and develop a framework to classify VPN or non-VPN network traffic using time-related features. Our focus is on classification of network traffic which is encrypted, tunnelled through a VPN, and the one which is normally encrypted (non-VPN transmission), using machine-learning techniques on data sets of time-related features. Six classification models: logistic regression, support vector machine, Naïve Bayes, k -nearest neighbour and ensemble methods – the Random Forest (RF) classifier and Gradient Boosting Tree (GBT) classifiers – are compared, and recommendations of optimised RF and GBT models over other models are provided in terms of high accuracy and low overfitting. Features which contributed to achieve 90% accuracy in each category were also identified. … (more)
- Is Part Of:
- Journal of cyber security technology. Volume 1:Issue 2(2017)
- Journal:
- Journal of cyber security technology
- Issue:
- Volume 1:Issue 2(2017)
- Issue Display:
- Volume 1, Issue 2 (2017)
- Year:
- 2017
- Volume:
- 1
- Issue:
- 2
- Issue Sort Value:
- 2017-0001-0002-0000
- Page Start:
- 108
- Page End:
- 126
- Publication Date:
- 2017-04-02
- Subjects:
- VPN -- supervised machine learning -- network traffic flow classification -- random forest -- gradient boosting tree
Computer security -- Periodicals
Data encryption (Computer science) -- Periodicals
005.805 - Journal URLs:
- http://www.tandfonline.com/ ↗
- DOI:
- 10.1080/23742917.2017.1321891 ↗
- Languages:
- English
- ISSNs:
- 2374-2917
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1794.xml