New document scoring model based on interval tree. (April 2018)
- Record Type:
- Journal Article
- Title:
- New document scoring model based on interval tree. (April 2018)
- Main Title:
- New document scoring model based on interval tree
- Authors:
- Xiong, Zeyu
Wang, Yijie - Abstract:
- Abstract: Classical BM25 scoring is designed for unstructured documents. In the past years, people try to adapt the BM25 ranking formula to deal with structured documents. Most works on structured document retrieval treat the combination of field scores, but it is hard to determine the field weights before the formation of document score. We aim to establish a new method to sort the field weights. The motivation comes from two aspects. On the one hand, the construction of interval tree reflects retrieval results with higher-order proximity for a text field. According to writing style, the important sentence or phrase for representing main idea frequently appear in the front or the rear part of a text-field. Therefore, the proximity scoring for different part in a text-field should be different. We thus take higher factor for calculating proximity scoring in the front and the rear parts than in the middle part. On the other hand, the more the interval length includes inquiring terms, the less the proximity scoring is, thereby the higher tf value for term appearing in an interval should affect the computation of proximity scoring. Therefore, we develop a new method for calculating the field weights based on the ranking score. The ranking score for each field can be calculated by interval tree based on terms relevance. Interval tree can be viewed as a tool of higher terms proximity in text visualization. This new field weights reflect the terms proximity and can be used toAbstract: Classical BM25 scoring is designed for unstructured documents. In the past years, people try to adapt the BM25 ranking formula to deal with structured documents. Most works on structured document retrieval treat the combination of field scores, but it is hard to determine the field weights before the formation of document score. We aim to establish a new method to sort the field weights. The motivation comes from two aspects. On the one hand, the construction of interval tree reflects retrieval results with higher-order proximity for a text field. According to writing style, the important sentence or phrase for representing main idea frequently appear in the front or the rear part of a text-field. Therefore, the proximity scoring for different part in a text-field should be different. We thus take higher factor for calculating proximity scoring in the front and the rear parts than in the middle part. On the other hand, the more the interval length includes inquiring terms, the less the proximity scoring is, thereby the higher tf value for term appearing in an interval should affect the computation of proximity scoring. Therefore, we develop a new method for calculating the field weights based on the ranking score. The ranking score for each field can be calculated by interval tree based on terms relevance. Interval tree can be viewed as a tool of higher terms proximity in text visualization. This new field weights reflect the terms proximity and can be used to calculate document scoring for terms retrieval. Experimental results show that the new document scoring model well reflects the terms proximity, and the new document scoring scheme ScoreComp, combined with interval scoring, is more sensitive than scheme FreqComp combined with interval scoring. … (more)
- Is Part Of:
- Journal of visual languages & computing. Volume 45(2018)
- Journal:
- Journal of visual languages & computing
- Issue:
- Volume 45(2018)
- Issue Display:
- Volume 45, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 45
- Issue:
- 2018
- Issue Sort Value:
- 2018-0045-2018-0000
- Page Start:
- 39
- Page End:
- 43
- Publication Date:
- 2018-04
- Subjects:
- Interval scoring -- Field weights -- Term frequency -- Interval tree -- Text visualization
Visual programming languages (Computer science) -- Periodicals
Visual programming (Computer science) -- Periodicals
Programming languages (Electronic computers) -- Semantics -- Periodicals
Langages de programmation visuelle -- Périodiques
Programmation visuelle -- Périodiques
Langages de programmation -- Sémantique -- Périodiques
Programming languages (Electronic computers) -- Semantics
Visual programming (Computer science)
Visual programming languages (Computer science)
Periodicals
Electronic journals
005 - Journal URLs:
- http://www.sciencedirect.com/science/journal/1045926X ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.jvlc.2018.01.003 ↗
- Languages:
- English
- ISSNs:
- 1045-926X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5072.495200
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12299.xml