RGB-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control
This work addresses the challenge of safe robot manipulation in tabletop environments without requiring depth sensors, which could reduce costs and complexity, though it is incremental as it builds on existing NeRF and control methods.
The paper tackles the problem of enabling collision-free robot manipulator control using only RGB camera inputs by reconstructing 3D scene geometry with a NeRF-like method and computing a signed distance function for obstacle avoidance, achieving successful control in real-world tabletop scenarios.
We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function (ESDF) is computed. A model predictive control algorithm is then used to control the manipulator to reach a desired pose while avoiding obstacles in the ESDF. We show results on a real dataset collected and annotated in our lab.