A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art. (October 2019)
- Record Type:
- Journal Article
- Title:
- A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art. (October 2019)
- Main Title:
- A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art
- Authors:
- Lastra-Díaz, Juan J.
Goikoetxea, Josu
Hadj Taieb, Mohamed Ali
García-Serrano, Ana
Ben Aouicha, Mohamed
Agirre, Eneko - Abstract:
- Abstract: Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailedAbstract: Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as supplementary material to allow the exact replication of our results. Graphical abstract: Highlights: A large reproducible survey of ontology-based similarity measures and word embeddings. Embeddings using ontologies get the best overall results on word similarity and relatedness. Best performing WordNet-based similarity measures use IC models & path-based features. Linear combinations of best-performing word embeddings improve the state of the art. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 85(2019)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 85(2019)
- Issue Display:
- Volume 85, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 85
- Issue:
- 2019
- Issue Sort Value:
- 2019-0085-2019-0000
- Page Start:
- 645
- Page End:
- 665
- Publication Date:
- 2019-10
- Subjects:
- Ontology-based semantic similarity measures -- Word embedding models -- Information Content models -- WordNet -- Experimental survey -- HESML
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2019.07.010 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11678.xml