VIRDO: Visio-tactile Implicit Representations of Deformable Objects
This addresses the challenge of efficient deformable object manipulation for robotics, though it appears incremental as it builds on existing implicit representation methods.
The paper tackles the problem of representing deformable objects for robotic manipulation by introducing VIRDO, an implicit multi-modal representation that uses visual and tactile data to predict object deformations, achieving high-fidelity reconstructions and generalization to unseen contact formations.
Deformable object manipulation requires computationally efficient representations that are compatible with robotic sensing modalities. In this paper, we present VIRDO:an implicit, multi-modal, and continuous representation for deformable-elastic objects. VIRDO operates directly on visual (point cloud) and tactile (reaction forces) modalities and learns rich latent embeddings of contact locations and forces to predict object deformations subject to external contacts.Here, we demonstrate VIRDOs ability to: i) produce high-fidelity cross-modal reconstructions with dense unsupervised correspondences, ii) generalize to unseen contact formations,and iii) state-estimation with partial visio-tactile feedback