Detecting Web Spam Based on Novel Features from Web Page Source Code. (17th December 2020)
- Record Type:
- Journal Article
- Title:
- Detecting Web Spam Based on Novel Features from Web Page Source Code. (17th December 2020)
- Main Title:
- Detecting Web Spam Based on Novel Features from Web Page Source Code
- Authors:
- Liu, Jiayong
Su, Yu
Lv, Shun
Huang, Cheng - Other Names:
- Zhang Liguo Academic Editor.
- Abstract:
- Abstract : Search engine is critical in people's daily life because it determines the information quality people obtain through searching. Fierce competition for the ranking in search engines is not conducive to both users and search engines. Existing research mainly studies the content and links of websites. However, none of these techniques focused on semantic analysis of link and anchor text for detection. In this paper, we propose a web spam detection method by extracting novel feature sets from the homepage source code and choosing the random forest (RF) as the classifier. The novel feature sets are extracted from the homepage's links, hypertext markup language (HTML) structure, and semantic similarity of content. We conduct experiments on the WEBSPAM-UK2007 and UK-2011 dataset using a five-fold cross-validation method. Besides, we design three sets of experiments to evaluate the performance of the proposed method. The proposed method with novel feature sets is compared with different indicators and has better performance than other methods with a precision of 0.929 and a recall of 0.930. Experiment results show that the proposed model could effectively detect web spam.
- Is Part Of:
- Security and communication networks. Volume 2020(2020)
- Journal:
- Security and communication networks
- Issue:
- Volume 2020(2020)
- Issue Display:
- Volume 2020, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 2020
- Issue:
- 2020
- Issue Sort Value:
- 2020-2020-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12-17
- Subjects:
- Computer networks -- Security measures -- Periodicals
Computer security -- Periodicals
Cryptography -- Periodicals
005.805 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1939-0122 ↗
https://www.hindawi.com/journals/scn/ ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1155/2020/6662166 ↗
- Languages:
- English
- ISSNs:
- 1939-0114
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 15375.xml