CVAug 4, 2012

Human Activity Learning using Object Affordances from RGB-D Videos

arXiv:1208.0967v119 citations
AI Analysis

This work addresses activity recognition for applications like robotics or surveillance by integrating object affordances, but it is incremental as it builds on existing methods for joint labeling.

The paper tackled the problem of jointly labeling object affordances and human activities from RGB-D videos by framing it as a Markov Random Field and using a structural SVM approach, achieving an end-to-end precision of 81.8% and recall of 80.0% on a dataset of 120 activity videos.

Human activities comprise several sub-activities performed in a sequence and involve interactions with various objects. This makes reasoning about the object affordances a central task for activity recognition. In this work, we consider the problem of jointly labeling the object affordances and human activities from RGB-D videos. We frame the problem as a Markov Random Field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural SVM approach, where labeling over various alternate temporal segmentations are considered as latent variables. We tested our method on a dataset comprising 120 activity videos collected from four subjects, and obtained an end-to-end precision of 81.8% and recall of 80.0% for labeling the activities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes