EgoExo++: Integrating On-demand Exocentric Visuals with 2.5D Ground Surface Estimation for Interactive Teleoperation of Underwater ROVs
For operators of underwater ROVs, this work provides a practical, geometry-driven method to enhance situational awareness and teleoperation performance in complex environments.
EgoExo++ synthesizes on-demand exocentric views and estimates 2.5D ground surfaces from egocentric camera feeds to improve underwater ROV teleoperation. In user studies, it achieved 16% faster missions, 5-fold reduction in path deviation, and fewer collisions compared to baseline egocentric teleoperation.
Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators' field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we first propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. We further propose EgoExo++, which extends beyond 2D exocentric view synthesis (EgoExo) to augment a piecewise planar 2.5D ground surface estimation on-the-fly. Its anchor-free aerial viewpoint supports ground-relative reasoning, such as clearance and terrain-based navigation marker following. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. To assess operational benefits, we conduct two user studies with simulation and real-world data, each involving 15 participants, comparing baseline egocentric teleoperation and EgoExo++. Results indicate improved system usability (SUS), reduced perceived workload (NASA-TLX), and significant gains in objective teleoperation performance, including 16% faster missions, 5-fold reduction in path deviation ratio, and fewer collision events (2 vs. 5 across trials). Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics.