CV AI GR LGJan 9, 2019

Learning to Infer and Execute 3D Shape Programs

Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

arXiv:1901.02875v327.1159 citations

Originality Highly original

AI Analysis

This addresses the challenge of capturing structural priors in 3D shape perception for computer vision and graphics applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of understanding higher-level structural relationships in 3D shapes, such as repetition and symmetry, by proposing 3D shape programs that integrate bottom-up recognition with top-down symbolic structure, resulting in a model that accurately infers and executes these programs for complex shapes and improves 3D shape reconstruction accuracy and physical plausibility from RGB images.

Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts. In contrast, recent advances in 3D shape sensing focus more on low-level geometry but less on these higher-level relationships. In this paper, we propose 3D shape programs, integrating bottom-up recognition systems with top-down, symbolic program structure to capture both low-level geometry and high-level structural priors for 3D shapes. Because there are no annotations of shape programs for real shapes, we develop neural modules that not only learn to infer 3D shape programs from raw, unannotated shapes, but also to execute these programs for shape reconstruction. After initial bootstrapping, our end-to-end differentiable model learns 3D shape programs by reconstructing shapes in a self-supervised manner. Experiments demonstrate that our model accurately infers and executes 3D shape programs for highly complex shapes from various categories. It can also be integrated with an image-to-shape module to infer 3D shape programs directly from an RGB image, leading to 3D shape reconstructions that are both more accurate and more physically plausible.

View on arXiv PDF

Similar