ROSep 26, 2021

VP-GO: a "light" action-conditioned visual prediction model

arXiv:2109.12694v1
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and realistic visual prediction models for robotic grasping, particularly for cluttered and soft objects, though it appears incremental by building on existing datasets and methods.

The authors tackled the problem of computationally expensive and insufficiently stochastic visual prediction models for robotic grasping of unknown soft objects by proposing VP-GO, a lightweight stochastic action-conditioned model that performs similarly to more complex models in signal prediction metrics and qualitatively outperforms in predicting complex grasp outcomes.

Visual prediction models are a promising solution for visual-based robotic grasping of cluttered, unknown soft objects. Previous models from the literature are computationally greedy, which limits reproducibility; although some consider stochasticity in the prediction model, it is often too weak to catch the reality of robotics experiments involving grasping such objects. Furthermore, previous work focused on elementary movements that are not efficient to reason in terms of more complex semantic actions. To address these limitations, we propose VP-GO, a ``light'' stochastic action-conditioned visual prediction model. We propose a hierarchical decomposition of semantic grasping and manipulation actions into elementary end-effector movements, to ensure compatibility with existing models and datasets for visual prediction of robotic actions such as RoboNet. We also record and release a new open dataset for visual prediction of object grasping, called PandaGrasp. Our model can be pre-trained on RoboNet and fine-tuned on PandaGrasp, and performs similarly to more complex models in terms of signal prediction metrics. Qualitatively, it outperforms when predicting the outcome of complex grasps performed by our robot.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes