AIJan 27

Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning

arXiv:2601.20014v1

Originality Incremental advance

AI Analysis

This addresses a critical issue for AI systems in domains like task planning, where incomplete information can cause failures, though it is an incremental improvement over existing planning methods.

The paper tackles the problem of inference-time planning with large language models under partial observability, where missing preconditions lead to hallucinations or constraint violations, and introduces Self-Querying Bidirectional Categorical Planning (SQ-BCP) to reduce resource-violation rates to 14.9% and 5.8% on WikiHow and RecipeNLG tasks, compared to baseline rates of 26.0% and 15.7%.

Inference-time planning with large language models frequently breaks under partial observability: when task-critical preconditions are not specified at query time, models tend to hallucinate missing facts or produce plans that violate hard constraints. We introduce \textbf{Self-Querying Bidirectional Categorical Planning (SQ-BCP)}, which explicitly represents precondition status (\texttt{Sat}/\texttt{Viol}/\texttt{Unk}) and resolves unknowns via (i) targeted self-queries to an oracle/user or (ii) \emph{bridging} hypotheses that establish the missing condition through an additional action. SQ-BCP performs bidirectional search and invokes a pullback-based verifier as a categorical certificate of goal compatibility, while using distance-based scores only for ranking and pruning. We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements; under bounded branching and finite resolution depth, SQ-BCP finds an accepting plan when one exists. Across WikiHow and RecipeNLG tasks with withheld preconditions, SQ-BCP reduces resource-violation rates to \textbf{14.9\%} and \textbf{5.8\%} (vs.\ \textbf{26.0\%} and \textbf{15.7\%} for the best baseline), while maintaining competitive reference quality.

View on arXiv PDF

Similar