CLApr 30, 2020

Word Rotator's Distance

Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui

arXiv:2004.15003v331.21000 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the issue of semantic overlap assessment in NLP, offering an incremental improvement over alignment-based and sentence-vector methods.

The paper tackled the problem of measuring textual similarity by proposing a method that decouples word vectors into norm and direction, using earth mover's distance for alignment-based similarity, and it outperformed existing approaches on several datasets.

A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover's distance (i.e., optimal transport cost), which we refer to as word rotator's distance. Besides, we find how to grow the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. The source code is available at https://github.com/eumesy/wrd

View on arXiv PDF Code

Similar