Textual Spatial Cosine Similarity
This work addresses the need for efficient semantic similarity measures in enterprise-wide search environments, though it appears incremental as it builds on existing cosine similarity methods.
The authors tackled the problem of real-time document similarity in enterprise search by developing Textual Spatial Cosine Similarity, a method that uses word placement information to detect semantic similitude, with results showing it generalizes to include cosine similarity and paraphrasing detection as degenerate cases.
When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprise-wide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic level using word placement information contained in the document. We will see in this paper that two degenerate cases exist for this model, which coincide with Cosine Similarity on one side and with a paraphrasing detection model to the other.