CVJan 2, 2025

PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation

arXiv:2501.01121v12 citationsh-index: 11
AI Analysis

This work addresses the need for faster and lighter depth estimation models for applications in computer vision, though it is incremental as it builds on prior methods.

The paper tackles the problem of computational inefficiency in high-resolution depth estimation by introducing PatchRefiner V2, which uses lightweight encoders and novel modules to reduce model size and inference time while achieving state-of-the-art accuracy and speed on datasets like UnrealStereo4K.

While current high-resolution depth estimation methods achieve strong results, they often suffer from computational inefficiencies due to reliance on heavyweight models and multiple inference steps, increasing inference time. To address this, we introduce PatchRefiner V2 (PRV2), which replaces heavy refiner models with lightweight encoders. This reduces model size and inference time but introduces noisy features. To overcome this, we propose a Coarse-to-Fine (C2F) module with a Guided Denoising Unit for refining and denoising the refiner features and a Noisy Pretraining strategy to pretrain the refiner branch to fully exploit the potential of the lightweight refiner branch. Additionally, we introduce a Scale-and-Shift Invariant Gradient Matching (SSIGM) loss to enhance synthetic-to-real domain transfer. PRV2 outperforms state-of-the-art depth estimation methods on UnrealStereo4K in both accuracy and speed, using fewer parameters and faster inference. It also shows improved depth boundary delineation on real-world datasets like CityScape, ScanNet++, and KITTI, demonstrating its versatility across domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes