LG DL IR MLJul 11, 2012

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay

arXiv:1207.4157v1144 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of improving accuracy in citation matching and field extraction for researchers and database systems, though it is incremental as it builds on existing graphical model methods.

The paper tackled the problem of integrating information extraction and coreference resolution, which are typically done separately, by using conditionally-trained undirected graphical models. On a research paper citation dataset, this approach significantly reduced error by leveraging extraction uncertainty to improve coreference accuracy and vice versa.

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.

View on arXiv PDF

Similar