CL AIJun 28, 2021

Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings

Shib Sankar Dasgupta, Michael Boratko, Siddhartha Mishra, Shriya Atmakuri, Dhruvesh Patel, Xiang Lorraine Li, Andrew McCallum

arXiv:2106.14361v230.4640 citationsHas Code

Originality Highly original

AI Analysis

This addresses the limitation of vector dot product similarity in NLP for capturing rich word interactions, offering a novel approach for tasks requiring set-theoretic reasoning.

The paper tackles the problem of representing words with set-theoretic semantics, such as adjective-noun compounds and homographs, by introducing Word2Box, which uses box embeddings to capture these relationships. It demonstrates improved performance on word similarity tasks, especially for less common words, with quantitative and qualitative analysis showing enhanced expressivity.

Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars"$\subseteq$"cars") and homographs (eg. "tongue"$\cap$"body" should be similar to "mouth", while "tongue"$\cap$"language" should be similar to "dialect") have natural set-theoretic interpretations. Box embeddings are a novel region-based representation which provide the capability to perform these set-theoretic operations. In this work, we provide a fuzzy-set interpretation of box embeddings, and learn box representations of words using a set-theoretic training objective. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a quantitative and qualitative analysis exploring the additional unique expressivity provided by Word2Box.

View on arXiv PDF Code

Similar