Scene-Conditional 3D Object Stylization and Composition
This addresses the need for more controllable and realistic object-scene integration in 3D generative models, offering incremental improvements in stylization and composition.
The paper tackles the problem of generating 3D objects in isolation without scene context by proposing a framework for stylizing 3D assets to fit into 2D scenes and producing photorealistic compositions, demonstrating applicability to various indoor and outdoor scenes and arbitrary objects.
Recently, 3D generative models have made impressive progress, enabling the generation of almost arbitrary 3D assets from text or image inputs. However, these approaches generate objects in isolation without any consideration for the scene where they will eventually be placed. In this paper, we propose a framework that allows for the stylization of an existing 3D asset to fit into a given 2D scene, and additionally produce a photorealistic composition as if the asset was placed within the environment. This not only opens up a new level of control for object stylization, for example, the same assets can be stylized to reflect changes in the environment, such as summer to winter or fantasy versus futuristic settings-but also makes the object-scene composition more controllable. We achieve this by combining modeling and optimizing the object's texture and environmental lighting through differentiable ray tracing with image priors from pre-trained text-to-image diffusion models. We demonstrate that our method is applicable to a wide variety of indoor and outdoor scenes and arbitrary objects. Project page: https://jensenzhoujh.github.io/scene-cond-3d/.