A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. (November 2015)
- Record Type:
- Journal Article
- Title:
- A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. (November 2015)
- Main Title:
- A novel family of IC-based similarity measures with a detailed experimental survey on WordNet
- Authors:
- Lastra-Díaz, Juan J.
García-Serrano, Ana - Abstract:
- Abstract: This paper introduces a novel family of ontology-based similarity measures based on the Information Content (IC) theory, a detailed state of the art, a large experimental survey into ontology-based similarity measures on WordNet, and a new comparison between intrinsic and corpus-based IC models. Our experiments are based on our implementation of a large set of similarity measures, intrinsic and corpus-based IC models, which are evaluated on two known datasets and three different WordNet versions. The new measures are called weighted Jiang–Conrath distance ( wJ & Cdist ) and similarity ( wJ & Csim ), cosine-normalized Jiang–Conrath similarity ( cosJ & Csim ) and cosine-normalized weighted Jiang–Conrath similarity ( coswJ & Csim ). Two of our similarity measures outperform the state-of-the-art measures on the RG65 dataset, and one of them obtains the third overall score on all the datasets and evaluated WordNet versions. The cosine-normalized similarity measures are a non-linear normalization of the classic Jiang–Conrath (J&C) distance and the new wJ & C distance. On the other hand, the wJ & C distance is a generalization of the classic J&C distance which is based on the length of the shortest path between concepts within an IC-based weighted graph. Our measures are based on two not previously considered notions: (1) a generalization of the classic J&C distance to any type of taxonomy, based on an IC-based weighted graph derived from the conditional probabilitiesAbstract: This paper introduces a novel family of ontology-based similarity measures based on the Information Content (IC) theory, a detailed state of the art, a large experimental survey into ontology-based similarity measures on WordNet, and a new comparison between intrinsic and corpus-based IC models. Our experiments are based on our implementation of a large set of similarity measures, intrinsic and corpus-based IC models, which are evaluated on two known datasets and three different WordNet versions. The new measures are called weighted Jiang–Conrath distance ( wJ & Cdist ) and similarity ( wJ & Csim ), cosine-normalized Jiang–Conrath similarity ( cosJ & Csim ) and cosine-normalized weighted Jiang–Conrath similarity ( coswJ & Csim ). Two of our similarity measures outperform the state-of-the-art measures on the RG65 dataset, and one of them obtains the third overall score on all the datasets and evaluated WordNet versions. The cosine-normalized similarity measures are a non-linear normalization of the classic Jiang–Conrath (J&C) distance and the new wJ & C distance. On the other hand, the wJ & C distance is a generalization of the classic J&C distance which is based on the length of the shortest path between concepts within an IC-based weighted graph. Our measures are based on two not previously considered notions: (1) a generalization of the classic J&C distance to any type of taxonomy, based on an IC-based weighted graph derived from the conditional probabilities between child and parent concepts, and (2) a non-linear normalization function that converts the ontology-based semantic distances into similarity functions. Finally, the corpus-based IC models based on the Resnik method obtain rivaling results as regards the state-of-the-art intrinsic IC models, when they are used with some unexplored WordNet-based frequency files. Therefore, this latter fact allows us to reconsider some previous conclusions about the outperformance of the intrinsic IC models over the corpus-based ones. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 46:Part A(2015:Oct.)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 46:Part A(2015:Oct.)
- Issue Display:
- Volume 46 (2015)
- Year:
- 2015
- Volume:
- 46
- Issue Sort Value:
- 2015-0046-0000-0000
- Page Start:
- 140
- Page End:
- 153
- Publication Date:
- 2015-11
- Subjects:
- Ontology-based semantic similarity measures -- IC-based measures -- Semantic similarity -- Intrinsic and corpus-based information content models -- Jiang–Conrath distance -- Semantic similarity on WordNet survey
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2015.09.006 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 148.xml