Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models
This addresses the problem of efficient on-device information extraction for NLP applications, representing an incremental improvement over existing methods.
The paper tackles the challenge of deploying large language models on resource-constrained devices for information extraction by proposing a two-stage approach called DLISC, which improves schema identification and extraction efficiency, achieving notable gains in both effectiveness and efficiency across multiple datasets.
Information extraction (IE) plays a crucial role in natural language processing (NLP) by converting unstructured text into structured knowledge. Deploying computationally intensive large language models (LLMs) on resource-constrained devices for information extraction is challenging, particularly due to issues like hallucinations, limited context length, and high latency-especially when handling diverse extraction schemas. To address these challenges, we propose a two-stage information extraction approach adapted for on-device LLMs, called Dual-LoRA with Incremental Schema Caching (DLISC), which enhances both schema identification and schema-aware extraction in terms of effectiveness and efficiency. In particular, DLISC adopts an Identification LoRA module for retrieving the most relevant schemas to a given query, and an Extraction LoRA module for performing information extraction based on the previously selected schemas. To accelerate extraction inference, Incremental Schema Caching is incorporated to reduce redundant computation, substantially improving efficiency. Extensive experiments across multiple information extraction datasets demonstrate notable improvements in both effectiveness and efficiency.