ROAIJun 30, 2025

PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?

arXiv:2506.23725v13 citationsh-index: 21
Originality Synthesis-oriented
AI Analysis

This addresses the reliability of VLMs for robot manipulation by identifying critical limitations, though it is incremental as it focuses on benchmarking rather than proposing a new method.

The paper tackles the problem of whether Vision-Language Models (VLMs) understand low-level physical prerequisites for robot manipulation, such as object properties, affordances, and constraints, and introduces PAC Bench, a benchmark with over 30,000 annotations, revealing significant gaps in current VLMs' abilities.

Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that remains largely unverified. For robots to perform actions reliably, they must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspable, stackable), and physical constraints (e.g., stability, reachability, or an object's state, such as being closed). Despite the widespread use of VLMs in manipulation tasks, we argue that off-the-shelf models may lack this granular, physically grounded understanding, as such prerequisites are often overlooked during training. To address this critical gap, we introduce PAC Bench, a comprehensive benchmark designed to systematically evaluate VLMs on their understanding of core Properties, Affordances, and Constraints (PAC) from a task executability perspective. PAC Bench features a diverse dataset with over 30,000 annotations, comprising 673 real-world images (115 object classes, 15 property types, and 1 to 3 affordances defined per class), 100 real-world humanoid-view scenarios, and 120 unique simulated constraint scenarios across four tasks. Our evaluations reveal significant gaps in the ability of current VLMs to grasp fundamental physical concepts, highlighting limitations in their suitability for reliable robot manipulation and pointing to key areas for targeted research. PAC Bench also serves as a standardized benchmark for rigorously evaluating physical reasoning in VLMs and guiding the development of more robust, physically grounded models for robotic applications. Project Page: https://pacbench.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes