DB AIAug 13, 2025

AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

Yuchen Tian, Kaixin Li, Hao Chen, Ziyang Luo, Hongzhan Lin, Sebastian Schelter, Lun Du, Jing Ma

arXiv:2508.09631v1h-index: 19

Originality Incremental advance

AI Analysis

This addresses a key limitation for users relying on LLMs for database queries in real-world applications, but it is incremental as it focuses on evaluation rather than resolution.

The paper tackled the problem of LLMs struggling with ambiguous queries on graph-structured data by introducing AmbiGraph-Eval, a benchmark with a taxonomy of graph-query ambiguities, and found that even top models perform poorly, highlighting a critical gap.

Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain inherent ambiguities, and the interconnected nature of graph structures can amplify these challenges, leading to unintended or incorrect query results. To systematically evaluate LLMs on this front, we propose a taxonomy of graph-query ambiguities, comprising three primary types: Attribute Ambiguity, Relationship Ambiguity, and Attribute-Relationship Ambiguity, each subdivided into Same-Entity and Cross-Entity scenarios. We introduce AmbiGraph-Eval, a novel benchmark of real-world ambiguous queries paired with expert-verified graph query answers. Evaluating 9 representative LLMs shows that even top models struggle with ambiguous graph queries. Our findings reveal a critical gap in ambiguity handling and motivate future work on specialized resolution techniques.

View on arXiv PDF

Similar