CLMay 20, 2025

From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning

arXiv:2505.14425v21 citationsh-index: 4IJCNLP-AACL
Originality Incremental advance
AI Analysis

This addresses a generalization challenge for AI systems in grounded environments, but it is incremental as it focuses on specific spatial tasks.

The paper tackled the problem of instruction-tuned LLMs struggling to generalize from synthetic to human-authored instructions in spatial reasoning tasks, finding that performance degrades significantly on complex tasks, with detailed error analysis provided.

Instruction-tuned large language models (LLMs) have shown strong performance on a variety of tasks; however, generalizing from synthetic to human-authored instructions in grounded environments remains a challenge for them. In this work, we study generalization challenges in spatial grounding tasks where models interpret and translate instructions for building object arrangements on a $2.5$D grid. We fine-tune LLMs using only synthetic instructions and evaluate their performance on a benchmark dataset containing both synthetic and human-written instructions. Our results reveal that while models generalize well on simple tasks, their performance degrades significantly on more complex tasks. We present a detailed error analysis of the gaps in instruction generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes