CVDec 11, 2024

3D Mesh Editing using Masked LRMs

arXiv:2412.08641v215 citationsh-index: 11
AI Analysis

This addresses the problem of efficient and expressive 3D shape editing for computer graphics and vision applications, representing an incremental improvement over existing methods.

The paper tackles 3D mesh editing by formulating it as a conditional reconstruction problem where a masked region is filled using image guidance, achieving results comparable to state-of-the-art while being 2-10x faster.

We present a novel approach to shape editing, building on recent progress in 3D reconstruction from multi-view images. We formulate shape editing as a conditional reconstruction problem, where the model must reconstruct the input shape with the exception of a specified 3D region, in which the geometry should be generated from the conditional signal. To this end, we train a conditional Large Reconstruction Model (LRM) for masked reconstruction, using multi-view consistent masks rendered from a randomly generated 3D occlusion, and using one clean viewpoint as the conditional signal. During inference, we manually define a 3D region to edit and provide an edited image from a canonical viewpoint to fill that region. We demonstrate that, in just a single forward pass, our method not only preserves the input geometry in the unmasked region through reconstruction capabilities on par with SoTA, but is also expressive enough to perform a variety of mesh edits from a single image guidance that past works struggle with, while being 2-10x faster than the top-performing prior work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes