ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

Pei-An Chen, Yong-Ching Liang, Jia-Fong Yeh, Hung-Ting Su, Yi-Ting Chen, Min Sun, Winston Hsu

arXiv:2604.1490233.61 citationsh-index: 9

AI Analysis

For embodied AI researchers, this work addresses the overlooked problem of planning under unspecified, dynamic affordances, providing a benchmark and a practical module to improve agent adaptability.

The paper introduces DynAfford, a benchmark for evaluating embodied agents' ability to plan under dynamic affordance constraints, and ADAPT, a module that adds explicit affordance reasoning to existing planners. Experiments show ADAPT significantly improves task success and robustness, with a fine-tuned VLM outperforming GPT-4o for affordance inference.

Intelligent embodied agents should not simply follow instructions, as real-world environments often involve unexpected conditions and exceptions. However, existing methods usually focus on directly executing instructions, without considering whether the target objects can actually be manipulated, meaning they fail to assess available affordances. To address this limitation, we introduce DynAfford, a benchmark that evaluates embodied agents in dynamic environments where object affordances may change over time and are not specified in the instruction. DynAfford requires agents to perceive object states, infer implicit preconditions, and adapt their actions accordingly. To enable this capability, we introduce ADAPT, a plug-and-play module that augments existing planners with explicit affordance reasoning. Experiments demonstrate that incorporating ADAPT significantly improves robustness and task success across both seen and unseen environments. We also show that a domain-adapted, LoRA-finetuned vision-language model used as the affordance inference backend outperforms a commercial LLM (GPT-4o), highlighting the importance of task-aligned affordance grounding.

View on arXiv PDF

Similar