AIMay 14

Zero-Shot Goal Recognition with Large Language Models

arXiv:2605.1533343.4
Predicted impact top 65% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For the planning community, it provides the first systematic zero-shot evaluation of LLMs on goal recognition, revealing uneven competence that serves as a benchmark for LLMs' foundational planning knowledge.

This paper evaluates frontier LLMs as zero-shot goal recognizers on classical PDDL benchmarks, finding that some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of evidence.

Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes