Distributional semantics beyond words: Supervised learning of analogy and paraphrase
This work addresses the challenge of automating similarity measurement in natural language processing for tasks such as analogy and paraphrase, representing an incremental improvement over previous hand-coded methods.
The paper tackles the problem of extending distributional semantics beyond individual words to measure similarity for word pairs, phrases, and sentences, by proposing a supervised learning approach to generate combination functions, achieving state-of-the-art results on tasks like SAT analogies and paraphrase questions.
There have been several efforts to extend distributional semantics beyond individual words, to measure the similarity of word pairs, phrases, and sentences (briefly, tuples; ordered sets of words, contiguous or noncontiguous). One way to extend beyond words is to compare two tuples using a function that combines pairwise similarities between the component words in the tuples. A strength of this approach is that it works with both relational similarity (analogy) and compositional similarity (paraphrase). However, past work required hand-coding the combination function for different tasks. The main contribution of this paper is that combination functions are generated by supervised learning. We achieve state-of-the-art results in measuring relational similarity between word pairs (SAT analogies and SemEval~2012 Task 2) and measuring compositional similarity between noun-modifier phrases and unigrams (multiple-choice paraphrase questions).