RPD: A Distance Function Between Word Embeddings
This work addresses a gap in understanding how different word embeddings relate, which is important for researchers and practitioners in natural language processing, though it appears incremental as it builds on existing embedding methods.
The authors tackled the problem of quantifying differences between word embedding spaces by proposing a new metric called Relative pairwise inner Product Distance (RPD), which provides a unified scale for comparison and was used to systematically study the effects of algorithms, training processes, and corpora on embeddings.
It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.