CVDec 11, 2025

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

arXiv:2512.10959v12 citationsh-index: 36
Originality Highly original
AI Analysis

This work provides a scalable, depth-free solution for stereo generation, which could benefit applications in virtual reality and 3D media, though it is incremental in advancing diffusion-based approaches for this task.

The paper tackles the problem of generating stereo images from monocular views without using explicit depth or warping, by introducing StereoSpace, a diffusion-based framework that models geometry through viewpoint conditioning in a canonical space, achieving superior performance in perceptual comfort and geometric consistency compared to existing methods.

We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes