LG MLJan 18, 2013

Latent Relation Representations for Universal Schemas

Sebastian Riedel, Limin Yao, Andrew McCallum

arXiv:1301.4293v2

Originality Highly original

AI Analysis

This addresses the limitation of requiring existing datasets for relation extraction, benefiting researchers and practitioners in natural language processing and knowledge base construction.

The paper tackles the problem of relation extraction by using a universal schema that combines surface forms and existing database relations, avoiding the need for fixed schemas or manual annotation, and achieves substantially higher accuracy than traditional classification and outperforms state-of-the-art distant supervision systems.

Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing structured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all involved schemas (surface form predicates as in OpenIE, and relations in the schemas of pre-existing databases). This schema has an almost unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database of such schema we present a family of matrix factorization models that predict affinity between database tuples and relations. We show that this achieves substantially higher accuracy than the traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs such as Freebase, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms state-of-the-art distant supervision systems.

View on arXiv PDF

Similar