CVAILGJun 13, 2024

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

arXiv:2406.09292v228 citations
Originality Highly original
AI Analysis

This addresses the challenge of precise 3D object manipulation in AI-generated scenes for applications like video editing and 3D scene synthesis.

The paper tackles the problem of multi-object 3D pose control in image diffusion models by proposing Neural Assets, which use per-object representations to control object poses, enabling fine-grained editing and achieving state-of-the-art results on synthetic and real-world datasets.

We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are trained to reconstruct the respective objects in a different image, e.g., a later frame in the video. Importantly, we encode object visuals from the reference image while conditioning on object poses from the target frame. This enables learning disentangled appearance and pose features. Combining visual and 3D pose representations in a sequence-of-tokens format allows us to keep the text-to-image architecture of existing models, with Neural Assets in place of text tokens. By fine-tuning a pre-trained text-to-image diffusion model with this information, our approach enables fine-grained 3D pose and placement control of individual objects in a scene. We further demonstrate that Neural Assets can be transferred and recomposed across different scenes. Our model achieves state-of-the-art multi-object editing results on both synthetic 3D scene datasets, as well as two real-world video datasets (Objectron, Waymo Open).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes