CLFeb 23, 2016

Sentence Similarity Learning by Lexical Decomposition and Composition

arXiv:1602.07019v2207 citations
AI Analysis

This work addresses sentence similarity tasks for natural language processing applications, offering an incremental improvement by incorporating dissimilarities into existing methods.

The paper tackled the problem of sentence similarity by considering both similar and dissimilar parts of sentences through lexical decomposition and composition, achieving state-of-the-art performance on answer sentence selection and comparable results on paraphrase identification.

Most conventional sentence similarity methods only focus on similar parts of two input sentences, and simply ignore the dissimilar parts, which usually give us some clues and semantic meanings about the sentences. In this work, we propose a model to take into account both the similarities and dissimilarities by decomposing and composing lexical semantics over sentences. The model represents each word as a vector, and calculates a semantic matching vector for each word based on all words in the other sentence. Then, each word vector is decomposed into a similar component and a dissimilar component based on the semantic matching vector. After this, a two-channel CNN model is employed to capture features by composing the similar and dissimilar components. Finally, a similarity score is estimated over the composed feature vectors. Experimental results show that our model gets the state-of-the-art performance on the answer sentence selection task, and achieves a comparable result on the paraphrase identification task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes