RO AISep 23, 2025

Do You Need Proprioceptive States in Visuomotor Policies?

Juntu Zhao, Wenbo Lu, Di Zhang, Yufeng Liu, Yushen Liang, Tianluo Zhang, Yifeng Cao, Junyuan Xie, Yingdong Hu, Shengjie Wang, Junliang Guo, Dequan Wang

arXiv:2509.18644v27 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the issue of poor generalization in robot manipulation for researchers and practitioners, offering a novel approach that enhances data efficiency and cross-embodiment adaptation, though it is incremental in improving existing imitation-learning methods.

The study tackled the problem of visuomotor policies overfitting to proprioceptive states, which leads to poor spatial generalization, by proposing a State-free Policy that uses only visual observations. The result was a significant improvement in spatial generalization, with success rates increasing from 0% to 85% in height and from 6% to 64% in horizontal generalization across various real-world robot tasks.

Imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted together for precise control. However, in this study, we find that this common practice makes the policy overly reliant on the proprioceptive state input, which causes overfitting to the training trajectories and results in poor spatial generalization. On the contrary, we propose the State-free Policy, removing the proprioceptive state input and predicting actions only conditioned on visual observations. The State-free Policy is built in the relative end-effector action space, and should ensure the full task-relevant visual observations, here provided by dual wide-angle wrist cameras. Empirical results demonstrate that the State-free policy achieves significantly stronger spatial generalization than the state-based policy: in real-world tasks such as pick-and-place, challenging shirt-folding, and complex whole-body manipulation, spanning multiple robot embodiments, the average success rate improves from 0% to 85% in height generalization and from 6% to 64% in horizontal generalization. Furthermore, they also show advantages in data efficiency and cross-embodiment adaptation, enhancing their practicality for real-world deployment. Discover more by visiting: https://statefreepolicy.github.io.

View on arXiv PDF

Similar