DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
This addresses the challenge of 3D reconstruction from limited views for applications in computer vision and graphics, though it is incremental as it builds on existing NeRF frameworks.
The paper tackles the problem of learning structured 3D abstractions from sparse RGB images without 3D supervision, achieving superior performance over state-of-the-art methods for 3D primitive abstraction.
We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for color prediction. As our core contribution, we introduce differential primitive assembly (DPA) into NeRF to output a 3D occupancy field in place of density prediction, where the predicted occupancies serve as opacity values for volume rendering. Our network, coined DPA-Net, produces a union of convexes, each as an intersection of convex quadric primitives, to approximate the target 3D object, subject to an abstraction loss and a masking loss, both defined in the image space upon volume rendering. With test-time adaptation and additional sampling and loss designs aimed at improving the accuracy and compactness of the obtained assemblies, our method demonstrates superior performance over state-of-the-art alternatives for 3D primitive abstraction from sparse views.