CVMay 27, 2025

Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards

arXiv:2506.05367v22 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the challenge of stereo image generation for applications like VR and 3D visualization, but it is incremental as it builds on existing diffusion models.

The paper tackles the problem of generating stereo images from text prompts by fine-tuning Stable Diffusion on stereo datasets and using consistency rewards, achieving superior quality and outperforming existing methods.

In this paper, we propose a novel diffusion-based approach to generate stereo images given a text prompt. Since stereo image datasets with large baselines are scarce, training a diffusion model from scratch is not feasible. Therefore, we propose leveraging the strong priors learned by Stable Diffusion and fine-tuning it on stereo image datasets to adapt it to the task of stereo generation. To improve stereo consistency and text-to-image alignment, we further tune the model using prompt alignment and our proposed stereo consistency reward functions. Comprehensive experiments demonstrate the superiority of our approach in generating high-quality stereo images across diverse scenarios, outperforming existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes