Supervised classification of spam emails with natural language stylometry. Issue 8 (November 2016)
- Record Type:
- Journal Article
- Title:
- Supervised classification of spam emails with natural language stylometry. Issue 8 (November 2016)
- Main Title:
- Supervised classification of spam emails with natural language stylometry
- Authors:
- Shams, Rushdi
Mercer, Robert - Abstract:
- Abstract Email spam is one of the biggest threats to today's Internet. To deal with this threat, there are long-established measures like supervised anti-spam filters. In this paper, we report the development and evaluation ofsentinel —an anti-spam filter based on natural language and stylometry attributes. The performance of the filter is evaluated not only on non-personalized emails (i.e., emails collected randomly) but also on personalized emails (i.e., emails collected from particular individuals). Among the non-personalized datasets are CSDMC2010, SpamAssassin, and LingSpam, while the Enron-Spam collection comprises personalized emails. The proposed filter extracts natural language attributes from email text that are closely related to writer stylometry and generate classifiers using multiple learning algorithms. Experimental outcomes show that classifiers generated by meta-learning algorithms such asadaboostm1 andbagging are the best, performing equally well and surpassing the performance of a number of filters proposed in previous studies, while a random forest generated classifier is a close second. On the other hand, the performance of classifiers using support vector machine and Naïve Bayes is not satisfactory. In addition, we find much improved results on personalized emails and mixed results on non-personalized emails.
- Is Part Of:
- Neural computing & applications. Volume 27:Issue 8(2016)
- Journal:
- Neural computing & applications
- Issue:
- Volume 27:Issue 8(2016)
- Issue Display:
- Volume 27, Issue 8 (2016)
- Year:
- 2016
- Volume:
- 27
- Issue:
- 8
- Issue Sort Value:
- 2016-0027-0008-0000
- Page Start:
- 2315
- Page End:
- 2331
- Publication Date:
- 2016-11
- Subjects:
- Spam classification -- Natural language processing -- Stylometry -- Supervised machine learning -- Text classification -- Computational linguistics -- Text mining -- Performance evaluation
Neural networks (Computer science) -- Periodicals
Neural circuitry -- Periodicals
Artificial intelligence -- Periodicals
Neural Networks (Computer) -- Periodicals
Réseaux neuronaux (Informatique) -- Périodiques
Réseaux nerveux -- Périodiques
Intelligence artificielle -- Périodiques
006.32 - Journal URLs:
- http://www.springerlink.com/content/0941-0643/20/6/ ↗
http://www.springerlink.com/content/102827/ ↗
http://www.springer.com/gb/ ↗ - DOI:
- 10.1007/s00521-015-2069-7 ↗
- Languages:
- English
- ISSNs:
- 0941-0643
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 6081.280250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10048.xml