CLAIOct 24, 2023

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

arXiv:2310.16040v1155 citationsh-index: 19Has Code
Originality Highly original
AI Analysis

This addresses the challenge of personalized information extraction for non-expert users, representing an incremental advancement in adapting language models to specific extraction tasks.

The paper tackles the problem of aligning information extraction systems with long-tail ad-hoc use cases for non-expert users by proposing On-Demand Information Extraction, a novel paradigm that extracts content based on user instructions into structured tables, and introduces ODIE, which substantially outperforms existing open-source models of similar size on their benchmark.

Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes