CVAINov 8, 2025

S2ML: Spatio-Spectral Mutual Learning for Depth Completion

arXiv:2511.06033v12 citationsh-index: 12IEEE transactions on multimedia
Originality Incremental advance
AI Analysis

This addresses depth completion for vision tasks by leveraging physical characteristics, though it is incremental as it builds on existing methods with a novel hybrid approach.

The paper tackles incomplete depth images from RGB-D cameras by proposing a Spatio-Spectral Mutual Learning framework that harmonizes spatial and frequency domains, achieving improvements of 0.828 dB and 0.834 dB over the state-of-the-art on NYU-Depth V2 and SUN RGB-D datasets.

The raw depth images captured by RGB-D cameras using Time-of-Flight (TOF) or structured light often suffer from incomplete depth values due to weak reflections, boundary shadows, and artifacts, which limit their applications in downstream vision tasks. Existing methods address this problem through depth completion in the image domain, but they overlook the physical characteristics of raw depth images. It has been observed that the presence of invalid depth areas alters the frequency distribution pattern. In this work, we propose a Spatio-Spectral Mutual Learning framework (S2ML) to harmonize the advantages of both spatial and frequency domains for depth completion. Specifically, we consider the distinct properties of amplitude and phase spectra and devise a dedicated spectral fusion module. Meanwhile, the local and global correlations between spatial-domain and frequency-domain features are calculated in a unified embedding space. The gradual mutual representation and refinement encourage the network to fully explore complementary physical characteristics and priors for more accurate depth completion. Extensive experiments demonstrate the effectiveness of our proposed S2ML method, outperforming the state-of-the-art method CFormer by 0.828 dB and 0.834 dB on the NYU-Depth V2 and SUN RGB-D datasets, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes