CVApr 16, 2024

StyleCity: Large-Scale 3D Urban Scenes Stylization

arXiv:2404.10681v24 citationsh-index: 8ECCV
Originality Incremental advance
AI Analysis

This addresses the problem of virtual production prototyping for creators by enabling efficient stylization of urban scenes without complex setups, though it is incremental in applying existing 2D priors to 3D.

The paper tackles the challenge of creating large-scale 3D urban scenes with varied styles by introducing StyleCity, a vision-and-text-driven texture stylization system that stylizes neural texture fields and generates harmonic sky backgrounds, achieving superior qualitative and quantitative performance in experiments.

Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototyping process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes