CVSYApr 19, 2012

Dynamic Template Tracking and Recognition

arXiv:1204.4476v116 citations
Originality Incremental advance
AI Analysis

It addresses the problem of robust tracking for dynamic objects in computer vision, with applications in surveillance and human-computer interaction, but is incremental as it extends existing tracking methods to dynamic templates.

The paper tackles tracking non-rigid objects like dynamic textures and articulated humans by modeling appearance and motion with Linear Dynamical Systems, achieving performance comparable to or better than state-of-the-art methods on synthetic and real examples.

In this paper we address the problem of tracking non-rigid objects whose local appearance and motion changes as a function of time. This class of objects includes dynamic textures such as steam, fire, smoke, water, etc., as well as articulated objects such as humans performing various actions. We model the temporal evolution of the object's appearance/motion using a Linear Dynamical System (LDS). We learn such models from sample videos and use them as dynamic templates for tracking objects in novel videos. We pose the problem of tracking a dynamic non-rigid object in the current frame as a maximum a-posteriori estimate of the location of the object and the latent state of the dynamical system, given the current image features and the best estimate of the state in the previous frame. The advantage of our approach is that we can specify a-priori the type of texture to be tracked in the scene by using previously trained models for the dynamics of these textures. Our framework naturally generalizes common tracking methods such as SSD and kernel-based tracking from static templates to dynamic templates. We test our algorithm on synthetic as well as real examples of dynamic textures and show that our simple dynamics-based trackers perform at par if not better than the state-of-the-art. Since our approach is general and applicable to any image feature, we also apply it to the problem of human action tracking and build action-specific optical flow trackers that perform better than the state-of-the-art when tracking a human performing a particular action. Finally, since our approach is generative, we can use a-priori trained trackers for different texture or action classes to simultaneously track and recognize the texture or action in the video.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes