Modelling Semantic Categories using Conceptual Neighborhood
This work addresses the challenge of distinguishing categories from individuals in NLP embeddings, which is incremental as it builds on existing embedding methods by adding conceptual neighborhood modeling.
The paper tackles the problem of learning vector space embeddings for semantic categories by modeling them as regions, addressing the difficulty of estimating meaningful regions with few examples. It proposes using conceptual neighbors to improve accuracy and shows that incorporating them leads to more accurate region-based representations.
While many methods for learning vector space embeddings have been proposed in the field of Natural Language Processing, these methods typically do not distinguish between categories and individuals. Intuitively, if individuals are represented as vectors, we can think of categories as (soft) regions in the embedding space. Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category. To address this issue, we rely on the fact that different categories are often highly interdependent. In particular, categories often have conceptual neighbors, which are disjoint from but closely related to the given category (e.g.\ fruit and vegetable). Our hypothesis is that more accurate category representations can be learned by relying on the assumption that the regions representing such conceptual neighbors should be adjacent in the embedding space. We propose a simple method for identifying conceptual neighbors and then show that incorporating these conceptual neighbors indeed leads to more accurate region based representations.