CLJan 15, 2014

Unsupervised Methods for Determining Object and Relation Synonyms on the Web

arXiv:1401.5696v1122 citations
Originality Highly original
AI Analysis

This addresses the challenge of identifying synonyms without labeled data, which is critical for improving information extraction quality on the Web.

The paper tackles the problem of synonym resolution for objects and relations in unsupervised information extraction, presenting a system called Resolver that achieves 78% precision and 68% recall for objects and 90% precision and 35% recall for relations on Web data.

The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of resolvers probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes