ROMar 10

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

arXiv:2603.09971v161.94 citationsh-index: 76Has Code
Predicted impact top 1% in RO · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses robotic manipulation for researchers and practitioners by offering an easy-to-use, open-source system that integrates learning and planning, though it is incremental as it builds on existing components.

The authors tackled robotic manipulation by developing TiPToP, a modular system combining pretrained vision models with a task and motion planner to solve multi-step tasks from RGB images and natural-language instructions, achieving performance matching or outperforming a fine-tuned model on 28 tabletop tasks in simulation and real-world trials.

We present TiPToP, an extensible modular system that combines pretrained vision foundation models with an existing Task and Motion Planner (TAMP) to solve multi-step manipulation tasks directly from input RGB images and natural-language instructions. Our system aims to be simple and easy-to-use: it can be installed and run on a standard DROID setup in under one hour and adapted to new embodiments with minimal effort. We evaluate TiPToP -- which requires zero robot data -- over 28 tabletop manipulation tasks in simulation and the real world and find it matches or outperforms $π_{0.5}\text{-DROID}$, a vision-language-action (VLA) model fine-tuned on 350 hours of embodiment-specific demonstrations. TiPToP's modular architecture enables us to analyze the system's failure modes at the component level. We analyze results from an evaluation of 173 trials and identify directions for improvement. We release TiPToP open-source to further research on modular manipulation systems and tighter integration between learning and planning. Project website and code: https://tiptop-robot.github.io

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes