IRLGAug 14, 2013

Normalized Google Distance of Multisets with Applications

arXiv:1308.3177v19 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific problem in semantic distance measurement for researchers and practitioners in information retrieval or natural language processing, but it appears incremental as it extends an existing method to multisets.

The authors tackled the limitation of the Normalized Google Distance (NGD) for pairs of search terms by proposing an NGD for finite multisets, which better captures shared semantics across multiple terms, and demonstrated its advantages through applications and comparisons with pairwise NGD.

Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between pairs of search terms (including phrases) is not sufficient for all applications. We propose an NGD of finite multisets of search terms that is better for many applications. This gives a relative semantics shared by a multiset of search terms. We give applications and compare the results with those obtained using the pairwise NGD. The derivation of NGD method is based on Kolmogorov complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes