CLSep 15, 2025

AgenticIE: An Adaptive Agent for Information Extraction from Complex Regulatory Documents

arXiv:2509.11773v21 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the challenge of making EU construction product performance documents machine- and human-accessible for users dealing with multilingual and structurally diverse data, representing a domain-specific incremental improvement.

The paper tackled the problem of extracting information from complex, multilingual regulatory documents with varying layouts, by developing an adaptive agentic system that outperformed baselines with a ROUGE score of 0.783 compared to 0.703 and 0.608, and showed better cross-lingual stability.

Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. There are two challenges to make DoPs machine and human accessible through automated key-value pair extraction (KVP) and question answering (QA): (1) While some of their content is standardized, DoPs vary widely in layout, schema, and format; (2) Both users and documents are multilingual. Existing static or LLM-only Information Extraction (IE) pipelines fail to adapt to this structural document and user diversity. Our domain-specific, agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document language and modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Our agent outperforms baselines (ROUGE: 0.783 vs. 0.703/0.608) with better cross-lingual stability (17-point vs. 21-26-point variation).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes