Word2vec Conjecture and A Limitative Result
This addresses a foundational representational question in natural language processing, showing limitations for vector-based semantic models, and is incremental as it builds on prior work on word2vec.
The paper tackled the problem of whether all semantic word-word relations can be represented by vector differences, known as the word2vec conjecture, and found that a class of relations cannot be represented this way, establishing a limitative result for vector spaces over fields like real numbers.
Being inspired by the success of \texttt{word2vec} \citep{mikolov2013distributed} in capturing analogies, we study the conjecture that analogical relations can be represented by vector spaces. Unlike many previous works that focus on the distributional semantic aspect of \texttt{word2vec}, we study the purely \emph{representational} question: can \emph{all} semantic word-word relations be represented by differences (or directions) of vectors? We call this the word2vec conjecture and point out some of its desirable implications. However, we will exhibit a class of relations that cannot be represented in this way, thus falsifying the conjecture and establishing a limitative result for the representability of semantic relations by vector spaces over fields of characteristic 0, e.g., real or complex numbers.