CVDec 2, 2015

Actions ~ Transformations

arXiv:1512.00795v2241 citations
Originality Highly original
AI Analysis

This work addresses the challenge of action understanding in computer vision, offering a novel representation that enhances generalization beyond learned categories.

The paper tackles the problem of defining actions by representing them as transformations that change the environment's state, proposing a Siamese network model that improves action recognition on UCF101 and HMDB51 datasets and shows strong cross-category generalization on a new ACT dataset.

What defines an action like "kicking ball"? We argue that the true meaning of an action lies in the change or transformation an action brings to the environment. In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect). Motivated by recent advancements of video representation using deep learning, we design a Siamese network which models the action as a transformation on a high-level feature space. We show that our model gives improvements on standard action recognition datasets including UCF101 and HMDB51. More importantly, our approach is able to generalize beyond learned action categories and shows significant performance improvement on cross-category generalization on our new ACT dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes