CVMar 6

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Jonas Ernst, Wolfgang Boettcher, Lukas Hoyer, Jan Eric Lenssen, Bernt Schiele

arXiv:2603.06374v111.1h-index: 17

Predicted impact top 38% in CV · last 90 daysOriginality Highly original

AI Analysis

This work addresses the performance gap in weakly-supervised semantic segmentation for researchers and practitioners by improving accuracy with sparse annotations.

This paper tackles the problem of weakly-supervised semantic segmentation using sparse annotations. It introduces Rewis3d, a framework that leverages 3D scene reconstruction from 2D videos as an auxiliary supervisory signal, achieving state-of-the-art performance and outperforming existing approaches by 2-7% without additional labels or inference overhead.

We present Rewis3d, a framework that leverages recent advances in feed-forward 3D reconstruction to significantly improve weakly supervised semantic segmentation on 2D images. Obtaining dense, pixel-level annotations remains a costly bottleneck for training segmentation models. Alleviating this issue, sparse annotations offer an efficient weakly-supervised alternative. However, they still incur a performance gap. To address this, we introduce a novel approach that leverages 3D scene reconstruction as an auxiliary supervisory signal. Our key insight is that 3D geometric structure recovered from 2D videos provides strong cues that can propagate sparse annotations across entire scenes. Specifically, a dual student-teacher architecture enforces semantic consistency between 2D images and reconstructed 3D point clouds, using state-of-the-art feed-forward reconstruction to generate reliable geometric supervision. Extensive experiments demonstrate that Rewis3d achieves state-of-the-art performance in sparse supervision, outperforming existing approaches by 2-7% without requiring additional labels or inference overhead.

View on arXiv PDF

Similar