CLOct 25, 2018

Word Embedding based Edit Distance

arXiv:1810.10752v15 citations
Originality Incremental advance
AI Analysis

This addresses the problem of costly labeled data creation for text similarity in NLP, offering an incremental improvement over existing unsupervised methods.

The paper tackles unsupervised text similarity calculation by proposing Word Embedding based Edit Distance (WED), which integrates word embeddings into edit distance, and shows it outperforms state-of-the-art unsupervised methods on three benchmark datasets.

Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural networks are usually trained with labeled data in supervised learning, and creation of labeled data is usually very costly. In this short paper, we address unsupervised learning for text similarity calculation. We propose a new method called Word Embedding based Edit Distance (WED), which incorporates word embedding into edit distance. Experiments on three benchmark datasets show WED outperforms state-of-the-art unsupervised methods including edit distance, TF-IDF based cosine, word embedding based cosine, Jaccard index, etc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes