AIDBApr 21

DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

arXiv:2604.189647.8h-index: 2
Predicted impact top 78% in AI · last 90 daysOriginality Synthesis-oriented
AI Analysis

For LLM practitioners and data warehouse users, this benchmark reveals limitations in complex multi-hop reasoning.

DW-Bench evaluates LLMs on graph-topology reasoning over data warehouse schemas with FK and data-lineage edges. Tool-augmented methods outperform static ones but plateau on hard compositional subtypes.

This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtypes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes