CVFeb 6, 2015

Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel

arXiv:1502.02063v291 citations
AI Analysis

This work addresses visual recognition problems in domains like video analysis by introducing a framework to encode cardinality relations, but it appears incremental as it builds on existing multi-instance models without claiming broad SOTA impact.

The paper tackles visual recognition by modeling cardinality relationships between instances, such as counting frames in videos or individuals in group activities, to reduce sensitivity to clutter. Experiments on tasks like human activity recognition and video event detection show improved recognition results, though no specific numerical gains are provided.

Many visual recognition problems can be approached by counting instances. To determine whether an event is present in a long internet video, one could count how many frames seem to contain the activity. Classifying the activity of a group of people can be done by counting the actions of individual people. Encoding these cardinality relationships can reduce sensitivity to clutter, in the form of irrelevant frames or individuals not involved in a group activity. Learned parameters can encode how many instances tend to occur in a class of interest. To this end, this paper develops a powerful and flexible framework to infer any cardinality relation between latent labels in a multi-instance model. Hard or soft cardinality relations can be encoded to tackle diverse levels of ambiguity. Experiments on tasks such as human activity recognition, video event detection, and video summarization demonstrate the effectiveness of using cardinality relations for improving recognition results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes