CVJun 24, 2021

GaussiGAN: Controllable Image Synthesis with 3D Gaussians from Unposed Silhouettes

arXiv:2106.13215v15 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of structured 3D reconstruction from limited supervision for applications in computer vision and graphics, representing an incremental improvement over existing methods.

The paper tackles the problem of learning 3D object representations from unposed multi-view 2D masks, resulting in robust estimation of 3D camera and object spaces where recent baselines sometimes fail, and demonstrates object insertion with interactive posing on synthetic datasets.

We present an algorithm that learns a coarse 3D representation of objects from unposed multi-view 2D mask supervision, then uses it to generate detailed mask and image texture. In contrast to existing voxel-based methods for unposed object reconstruction, our approach learns to represent the generated shape and pose with a set of self-supervised canonical 3D anisotropic Gaussians via a perspective camera, and a set of per-image transforms. We show that this approach can robustly estimate a 3D space for the camera and object, while recent baselines sometimes struggle to reconstruct coherent 3D spaces in this setting. We show results on synthetic datasets with realistic lighting, and demonstrate object insertion with interactive posing. With our work, we help move towards structured representations that handle more real-world variation in learning-based object reconstruction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes