CVMay 23, 2024

Multistable Shape from Shading Emerges from Patch Diffusion

arXiv:2405.14530v26 citationsh-index: 2NIPS
Originality Incremental advance
AI Analysis

This addresses the limitation of current models in capturing mathematical ambiguities in shape perception, offering a more human-aligned approach for computer vision tasks.

The paper tackled the problem of inferring shape from shading by developing a model that reconstructs multimodal distributions of shapes from single images, aligning with human multistable perception, and it demonstrated this on ambiguous test images while producing accurate estimates for less ambiguous ones.

Models for inferring monocular shape of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) types that are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from $16\times 16$ patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable. At the same time, the model produces veridical shape estimates for object-like images that include distinctive occluding contours and appear less ambiguous. This may inspire new architectures for stochastic 3D shape perception that are more efficient and better aligned with human experience.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes