IRAIMar 11

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

arXiv:2603.10700v19.31 citationsh-index: 6
Predicted impact top 85% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the problem of inefficient knowledge retrieval in AI systems for domains like editorial, legal, travel, and e-commerce, representing an incremental improvement with specific optimizations.

The paper tackles the problem of retrieval-augmented generation (RAG) systems ignoring structured metadata by investigating whether structured linked data improves retrieval accuracy and answer quality, finding that an enhanced entity page format achieves substantial gains of +29.6% accuracy for standard RAG and +29.8% for agentic RAG.

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes