Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards
This work addresses the challenge of stereo image generation for applications like VR and 3D visualization, but it is incremental as it builds on existing diffusion models.
The paper tackles the problem of generating stereo images from text prompts by fine-tuning Stable Diffusion on stereo datasets and using consistency rewards, achieving superior quality and outperforming existing methods.
In this paper, we propose a novel diffusion-based approach to generate stereo images given a text prompt. Since stereo image datasets with large baselines are scarce, training a diffusion model from scratch is not feasible. Therefore, we propose leveraging the strong priors learned by Stable Diffusion and fine-tuning it on stereo image datasets to adapt it to the task of stereo generation. To improve stereo consistency and text-to-image alignment, we further tune the model using prompt alignment and our proposed stereo consistency reward functions. Comprehensive experiments demonstrate the superiority of our approach in generating high-quality stereo images across diverse scenarios, outperforming existing methods.