Visual Prompt Guided Unified Pushing Policy
This work addresses robotic manipulation challenges for researchers and practitioners, offering an incremental improvement over existing pushing methods.
The paper tackles the problem of inefficient and limited pushing policies in robotic manipulation by proposing a unified pushing policy that incorporates visual prompts into flow matching, enabling reactive, multimodal actions. Experimental results show it outperforms baselines and serves effectively in a VLM-guided planning framework for table-cleaning tasks.
As one of the simplest non-prehensile manipulation skills, pushing has been widely studied as an effective means to rearrange objects. Existing approaches, however, typically rely on multi-step push plans composed of pre-defined pushing primitives with limited application scopes, which restrict their efficiency and versatility across different scenarios. In this work, we propose a unified pushing policy that incorporates a lightweight prompting mechanism into a flow matching policy to guide the generation of reactive, multimodal pushing actions. The visual prompt can be specified by a high-level planner, enabling the reuse of the pushing policy across a wide range of planning problems. Experimental results demonstrate that the proposed unified pushing policy not only outperforms existing baselines but also effectively serves as a low-level primitive within a VLM-guided planning framework to solve table-cleaning tasks efficiently.