Document-level Entity-based Extraction as Template Generation
This work addresses a key bottleneck in automatic knowledge acquisition from text for various domains by improving extraction accuracy, though it is incremental as it builds on existing generative methods.
The paper tackles the problem of modeling long-term dependencies in document-level entity-based extraction by proposing a generative framework that formulates the tasks as template generation, achieving new state-of-the-art results with F1 score improvements of +3.26% for role-filler entity extraction, +4.8% for binary relation extraction, and +2.7% for 4-ary relation extraction.
Document-level entity-based extraction (EE), aiming at extracting entity-centric information such as entity roles and entity relations, is key to automatic knowledge acquisition from text corpora for various domains. Most document-level EE systems build extractive models, which struggle to model long-term dependencies among entities at the document level. To address this issue, we propose a generative framework for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE). We first formulate them as a template generation problem, allowing models to efficiently capture cross-entity dependencies, exploit label semantics, and avoid the exponential computation complexity of identifying N-ary relations. A novel cross-attention guided copy mechanism, TopK Copy, is incorporated into a pre-trained sequence-to-sequence model to enhance the capabilities of identifying key information in the input document. Experiments done on the MUC-4 and SciREX dataset show new state-of-the-art results on REE (+3.26%), binary RE (+4.8%), and 4-ary RE (+2.7%) in F1 score.