Distributional Measures of Semantic Distance: A Survey
It addresses the gap in attention to distributional measures compared to WordNet-based ones, which is incremental as it synthesizes existing work without introducing new methods.
This paper surveys distributional measures of semantic distance, which use raw text to estimate semantic relationships, highlighting their strengths like applicability in resource-poor languages and ability to mimic both similarity and relatedness, and discusses how to align them better with human judgment.
The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures.