LGCLMLJun 20, 2018

The Corpus Replication Task

arXiv:1806.07978v1
Originality Synthesis-oriented
AI Analysis

This work addresses foundational questions in NLP about relational similarity in word embeddings, but it appears incremental as it revisits and extends known concepts without introducing a new paradigm.

The paper tackles the problem of understanding which relations can be represented in word embeddings from word2vec and how they are built, by proposing the Corpus Replication Task to generate input text that yields target relations, aiming to provide partial answers to these questions.

In the field of Natural Language Processing (NLP), we revisit the well-known word embedding algorithm word2vec. Word embeddings identify words by vectors such that the words' distributional similarity is captured. Unexpectedly, besides semantic similarity even relational similarity has been shown to be captured in word embeddings generated by word2vec, whence two questions arise. Firstly, which kind of relations are representable in continuous space and secondly, how are relations built. In order to tackle these questions we propose a bottom-up point of view. We call generating input text for which word2vec outputs target relations solving the Corpus Replication Task. Deeming generalizations of this approach to any set of relations possible, we expect solving of the Corpus Replication Task to provide partial answers to the questions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes