Moving Other Way: Exploring Word Mover Distance Extensions
This work addresses incremental improvements in semantic similarity metrics for document classification tasks.
The paper explores extensions to the Word Mover's Distance (WMD) metric by experimenting with word frequency weighting and word vector geometry, and validates these on six document classification datasets, showing that some extensions achieve lower k-nearest neighbor classification error than WMD.
The word mover's distance (WMD) is a popular semantic similarity metric for two texts. This position paper studies several possible extensions of WMD. We experiment with the frequency of words in the corpus as a weighting factor and the geometry of the word vector space. We validate possible extensions of WMD on six document classification datasets. Some proposed extensions show better results in terms of the k-nearest neighbor classification error than WMD.