IRNov 29, 2015

Long Concept Query on Conceptual Taxonomies

arXiv:1511.09009v1
Originality Incremental advance
AI Analysis

This addresses a practical limitation in knowledge base retrieval for real-life queries, though it appears incremental as it builds on existing short-concept methods.

The paper tackles the problem of retrieving entities for long concept queries (LCQs) like 'top American private university', which cannot be pre-materialized, by proposing techniques to augment concepts and rank entities to avoid false positives, resulting in significant outperformance over state-of-the-art methods.

This paper studies the problem of finding typical entities when the concept is given as a query. For a short concept such as university, this is a well-studied problem of retrieving knowledge base such as Microsoft's Probase and Google's isA database pre-materializing entities found for the concept in Hearst patterns of the web corpus. However, we find most real-life queries are long concept queries (LCQs), such as top American private university, which cannot and should not be pre-materialized. Our goal is an online construction of entity retrieval for LCQs. We argue a naive baseline of rewriting LCQs into an intersection of an expanded set of composing short concepts leads to highly precise results with extremely low recall. Instead, we propose to augment the concept list, by identifying related concepts of the query concept. However, as such increase of recall often invites false positives and decreases precision in return, we propose the following two techniques: First, we identify concepts with different relatedness to generate linear orderings and pairwise ordering constraints. Second, we rank entities trying to avoid conflicts with these constraints, to prune out lowly ranked one (likely false positives). With these novel techniques, our approach significantly outperforms state-of-the-arts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes