CVOct 19, 2024

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

arXiv:2410.14980v219 citationsh-index: 13Has CodeNIPS
AI Analysis

This addresses the problem of accurate depth estimation from single images for applications like robotics and autonomous driving, representing an incremental improvement with a novel frequency-based approach.

The paper tackles monocular depth estimation by proposing DCDepth, a framework that estimates depth in the discrete cosine domain to model local correlations and uses a progressive strategy from low to high frequencies, achieving state-of-the-art performance on datasets like NYU-Depth-V2, TOFDC, and KITTI.

In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes