CVNov 27, 2014

Visual Representations: Defining Properties and Deep Approximations

arXiv:1411.7676v934 citations

Originality Highly original

AI Analysis

This work provides a foundational theoretical framework for understanding and improving visual representation methods in computer vision, addressing a core problem for researchers and practitioners in the field.

The paper tackles the problem of defining optimal visual representations by deriving them as minimal sufficient statistics that are invariant to nuisance variability, linking these theoretical constructs to practical methods like convolutional neural networks and explaining common empirical practices.

Visual representations are defined in terms of minimal sufficient statistics of visual data, for a class of tasks, that are also invariant to nuisance variability. Minimal sufficiency guarantees that we can store a representation in lieu of raw data with smallest complexity and no performance loss on the task at hand. Invariance guarantees that the statistic is constant with respect to uninformative transformations of the data. We derive analytical expressions for such representations and show they are related to feature descriptors commonly used in computer vision, as well as to convolutional neural networks. This link highlights the assumptions and approximations tacitly assumed by these methods and explains empirical practices such as clamping, pooling and joint normalization.

View on arXiv PDF

Similar