TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation
For robotic in-hand manipulation, TacSE3 addresses the problem of motion tracking under visual occlusion with low-texture tactile images, offering a physically interpretable signal for compensation.
TacSE3 estimates SE(3) rigid-body motion from low-texture visuotactile images by converting tactile observations into a decoupled force field, achieving rotation tracking across axes and object geometries. Dual-sensor sensing reduces translation-rotation ambiguity and provides a lightweight compensation signal that improves disturbance tolerance in manipulation tasks without retraining.
Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.