CVLGMLJan 16, 2013

Learning Graphical Models of Images, Videos and Their Spatial Transformations

arXiv:1301.3854v139 citations
Originality Incremental advance
AI Analysis

This addresses the need for transformation-invariant modeling in computer vision and image processing, though it is incremental as it extends existing probabilistic models.

The paper tackles the problem of making probabilistic models like mixtures of Gaussians invariant to spatial transformations (e.g., translation, shearing) in images and videos, by incorporating a discrete transformation variable, and demonstrates results on tasks such as image filtering, clustering, and object tracking.

Mixtures of Gaussians, factor analyzers (probabilistic PCA) and hidden Markov models are staples of static and dynamic data modeling and image and video modeling in particular. We show how topographic transformations in the input, such as translation and shearing in images, can be accounted for in these models by including a discrete transformation variable. The resulting models perform clustering, dimensionality reduction and time-series analysis in a way that is invariant to transformations in the input. Using the EM algorithm, these transformation-invariant models can be fit to static data and time series. We give results on filtering microscopy images, face and facial pose clustering, handwritten digit modeling and recognition, video clustering, object tracking, and removal of distractions from video sequences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes