Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation
This addresses the need for interaction-centric teleoperation in robotics, though it appears incremental as it combines existing components like vision-language segmentation and 3D Gaussian Splatting.
The authors tackled the problem of enabling multilateral teleoperation by transforming CCTV streams into a shared 6-DoF world model, achieving real-time global pose estimation for multiple robots without fiducials or depth sensors.
We introduce Zero-Splat TeleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleoperation. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction, and 3D Gaussian Splatting (3DGS), TeleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation setup.