CLAILGFeb 14, 2020

Semantic Relatedness and Taxonomic Word Embeddings

arXiv:2002.06235v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving semantic representation in NLP by exploring taxonomic embeddings, but it appears incremental as it builds on prior research without introducing a major breakthrough.

The paper investigates how different types of semantic relatedness, particularly thematic versus taxonomic, are encoded in word embeddings, and analyzes the effects of synthetic corpus properties and corpus size interactions on embedding performance.

This paper connects a series of papers dealing with taxonomic word embeddings. It begins by noting that there are different types of semantic relatedness and that different lexical representations encode different forms of relatedness. A particularly important distinction within semantic relatedness is that of thematic versus taxonomic relatedness. Next, we present a number of experiments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy. These experiments demonstrate how the properties of the synthetic corpus, such as the percentage of rare words, are affected by the shape of the knowledge graph the corpus is generated from. Finally, we explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes