Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method. (23rd April 2022)
- Record Type:
- Journal Article
- Title:
- Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method. (23rd April 2022)
- Main Title:
- Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method
- Authors:
- Lan, Fei
- Other Names:
- Li Qiangyi Academic Editor.
- Abstract:
- Abstract : TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because TF-IDF does not consider the semantic information of words, it cannot accurately reflect the similarity between texts, and semantic information enhanced methods distinguish between text documents poorly because extended vectors with semantic similar terms aggravate the curse of dimensionality. Aiming at this problem, this paper advances a hybrid with the semantic understanding and TF-IDF to calculate the similarity of texts. Based on term similarity weighting tree (TSWT) data structure and the definition of semantic similarity information from the HowNet, the paper firstly discusses text preprocess and filter process and then utilizes the semantic information of those key terms to calculate similarities of text documents according to the weight of the features whose weight is greater than the given threshold. The experimental results show that the hybrid method is better than the pure TF-IDF and the method of semantic understanding at the aspect of accuracy, recall, and F1-metric by different K-means clustering methods.
- Is Part Of:
- Advances in multimedia. Volume 2022(2022)
- Journal:
- Advances in multimedia
- Issue:
- Volume 2022(2022)
- Issue Display:
- Volume 2022, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 2022
- Issue:
- 2022
- Issue Sort Value:
- 2022-2022-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04-23
- Subjects:
- Multimedia systems -- Periodicals
Computer networks -- Periodicals
Multimédia
Réseaux d'ordinateurs
Computer networks
Multimedia systems
Periodicals
006.7 - Journal URLs:
- https://www.hindawi.com/journals/am/ ↗
http://bibpurl.oclc.org/web/22854 ↗ - DOI:
- 10.1155/2022/7923262 ↗
- Languages:
- English
- ISSNs:
- 1687-5680
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 21434.xml