CVApr 11, 2022

Physically Disentangled Representations

arXiv:2204.05281v11 citationsh-index: 85
Originality Incremental advance
AI Analysis

This work addresses the need for more robust and interpretable scene representations in computer vision, offering incremental improvements over existing generative methods.

The paper tackles the problem of learning physically disentangled representations of scenes, such as geometry and lighting, without supervision, using inverse rendering and a novel loss function, resulting in up to 18% higher accuracy on downstream tasks like classification and segmentation compared to semantic disentanglement methods.

State-of-the-art methods in generative representation learning yield semantic disentanglement, but typically do not consider physical scene parameters, such as geometry, albedo, lighting, or camera. We posit that inverse rendering, a way to reverse the rendering process to recover scene parameters from an image, can also be used to learn physically disentangled representations of scenes without supervision. In this paper, we show the utility of inverse rendering in learning representations that yield improved accuracy on downstream clustering, linear classification, and segmentation tasks with the help of our novel Leave-One-Out, Cycle Contrastive loss (LOOCC), which improves disentanglement of scene parameters and robustness to out-of-distribution lighting and viewpoints. We perform a comparison of our method with other generative representation learning methods across a variety of downstream tasks, including face attribute classification, emotion recognition, identification, face segmentation, and car classification. Our physically disentangled representations yield higher accuracy than semantically disentangled alternatives across all tasks and by as much as 18%. We hope that this work will motivate future research in applying advances in inverse rendering and 3D understanding to representation learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes