CLOct 12, 2022

Iterative Document-level Information Extraction via Imitation Learning

Yunmo Chen, William Gantt, Weiwei Gu, Tongfei Chen, Aaron Steven White, Benjamin Van Durme

Microsoft

arXiv:2210.06600v323.4273 citationsh-index: 60Has Code

Originality Highly original

AI Analysis

This addresses the challenge of document-level information extraction for tasks like relation and template extraction, offering a novel method that improves over existing benchmarks, though it is incremental in advancing extraction techniques.

The paper tackles the problem of extracting complex relations or templates from documents, where each template maps named slots to text spans, and presents IterX, an iterative extraction model using imitation learning. The approach achieves state-of-the-art results on benchmarks like SciREX and MUC-4, with strong performance on the new BETTER Granular task.

We present a novel iterative extraction model, IterX, for extracting complex relations, or templates (i.e., N-tuples representing a mapping from named slots to spans of text) within a document. Documents may feature zero or more instances of a template of any given type, and the task of template extraction entails identifying the templates in a document and extracting each template's slot values. Our imitation learning approach casts the problem as a Markov decision process (MDP), and relieves the need to use predefined template orders to train an extractor. It leads to state-of-the-art results on two established benchmarks -- 4-ary relation extraction on SciREX and template extraction on MUC-4 -- as well as a strong baseline on the new BETTER Granular task.

View on arXiv PDF Code

Similar