IRAICLDec 7, 2017

A Deep Network Model for Paraphrase Detection in Short Text Messages

arXiv:1712.02820v1117 citations
Originality Incremental advance
AI Analysis

This addresses the problem of paraphrase detection for applications like text mining and plagiarism detection in user-generated noisy texts, representing an incremental improvement over existing methods.

The paper tackled paraphrase detection in noisy short texts like Twitter by proposing a deep neural network model combining CNN, LSTM, and word-level similarity matching, which outperformed state-of-the-art methods on noisy social media data and achieved competitive performance on cleaner corpora.

This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a specific fine-grained word-level similarity matching model. Our experimental results show that the proposed approach outperforms existing state-of-the-art approaches on user-generated noisy social media data, such as Twitter texts, and achieves highly competitive performance on a cleaner corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes