Towards High-Precision Depth Sensing via Monocular-Aided iToF and RGB Integration
This work addresses depth sensing limitations for applications like robotics or AR/VR, but it appears incremental as it builds on existing fusion methods with specific improvements.
This paper tackles the problem of low spatial resolution, limited field-of-view, and structural distortion in indirect Time-of-Flight depth sensing by proposing an iToF-RGB fusion framework that integrates monocular depth priors, achieving enhanced depth accuracy and improved edge sharpness as demonstrated in experiments.
This paper presents a novel iToF-RGB fusion framework designed to address the inherent limitations of indirect Time-of-Flight (iToF) depth sensing, such as low spatial resolution, limited field-of-view (FoV), and structural distortion in complex scenes. The proposed method first reprojects the narrow-FoV iToF depth map onto the wide-FoV RGB coordinate system through a precise geometric calibration and alignment module, ensuring pixel-level correspondence between modalities. A dual-encoder fusion network is then employed to jointly extract complementary features from the reprojected iToF depth and RGB image, guided by monocular depth priors to recover fine-grained structural details and perform depth super-resolution. By integrating cross-modal structural cues and depth consistency constraints, our approach achieves enhanced depth accuracy, improved edge sharpness, and seamless FoV expansion. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed framework significantly outperforms state-of-the-art methods in terms of accuracy, structural consistency, and visual quality.