CLOct 12, 2022

Iterative Document-level Information Extraction via Imitation Learning

Microsoft
arXiv:2210.06600v3273 citationsh-index: 60
Originality Highly original
AI Analysis

This addresses the challenge of document-level information extraction for tasks like relation and template extraction, offering a novel method that improves over existing benchmarks, though it is incremental in advancing extraction techniques.

The paper tackles the problem of extracting complex relations or templates from documents, where each template maps named slots to text spans, and presents IterX, an iterative extraction model using imitation learning. The approach achieves state-of-the-art results on benchmarks like SciREX and MUC-4, with strong performance on the new BETTER Granular task.

We present a novel iterative extraction model, IterX, for extracting complex relations, or templates (i.e., N-tuples representing a mapping from named slots to spans of text) within a document. Documents may feature zero or more instances of a template of any given type, and the task of template extraction entails identifying the templates in a document and extracting each template's slot values. Our imitation learning approach casts the problem as a Markov decision process (MDP), and relieves the need to use predefined template orders to train an extractor. It leads to state-of-the-art results on two established benchmarks -- 4-ary relation extraction on SciREX and template extraction on MUC-4 -- as well as a strong baseline on the new BETTER Granular task.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes