Global Textual Relation Embedding for Relational Understanding
This work addresses the need for relational understanding in NLP by providing embeddings at an intermediate level between words and sentences, though it is incremental as it builds on existing embedding and distant supervision techniques.
The paper tackles the problem of learning general-purpose embeddings for textual relations, defined as the shortest dependency path between entities, by creating the largest distant supervision dataset linking English ClueWeb09 to Freebase and using global co-occurrence statistics as supervision. The result shows that these embeddings facilitate downstream tasks requiring relational understanding, as demonstrated in evaluations on two such tasks.
Pre-trained embeddings such as word embeddings and sentence embeddings are fundamental tools facilitating a wide range of downstream NLP tasks. In this work, we investigate how to learn a general-purpose embedding of textual relations, defined as the shortest dependency path between entities. Textual relation embedding provides a level of knowledge between word/phrase level and sentence level, and we show that it can facilitate downstream tasks requiring relational understanding of the text. To learn such an embedding, we create the largest distant supervision dataset by linking the entire English ClueWeb09 corpus to Freebase. We use global co-occurrence statistics between textual and knowledge base relations as the supervision signal to train the embedding. Evaluation on two relational understanding tasks demonstrates the usefulness of the learned textual relation embedding. The data and code can be found at https://github.com/czyssrs/GloREPlus