CVJul 1, 2025

Masks make discriminative models great again!

Tianshi Cao, Marie-Julie Rakotosaona, Ben Poole, Federico Tombari, Michael Niemeyer

arXiv:2507.00916v1h-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of 3D scene reconstruction from single images for computer vision applications, offering an incremental improvement by decoupling lifting from completion.

The paper tackles the problem of reconstructing photorealistic 3D scenes from a single image by focusing on the image-to-3D lifting component, using visibility masks to exclude unseen areas during training, which significantly improves reconstruction quality in visible regions and remains competitive with state-of-the-art models on complete scenes.

We present Image2GS, a novel approach that addresses the challenging problem of reconstructing photorealistic 3D scenes from a single image by focusing specifically on the image-to-3D lifting component of the reconstruction process. By decoupling the lifting problem (converting an image to a 3D model representing what is visible) from the completion problem (hallucinating content not present in the input), we create a more deterministic task suitable for discriminative models. Our method employs visibility masks derived from optimized 3D Gaussian splats to exclude areas not visible from the source view during training. This masked training strategy significantly improves reconstruction quality in visible regions compared to strong baselines. Notably, despite being trained only on masked regions, Image2GS remains competitive with state-of-the-art discriminative models trained on full target images when evaluated on complete scenes. Our findings highlight the fundamental struggle discriminative models face when fitting unseen regions and demonstrate the advantages of addressing image-to-3D lifting as a distinct problem with specialized techniques.

View on arXiv PDF

Similar