CVMar 26, 2016

Recognizing Car Fluents from Video

arXiv:1603.08067v153 citations
Originality Incremental advance
AI Analysis

This addresses a novel computer vision task with potential applications in surveillance and automotive systems, but it is incremental as it adapts existing hierarchical models to a specific domain.

The paper tackles the problem of recognizing time-varying states of vehicles (car fluents) from video, such as open doors or blinking lights, by proposing a spatial-temporal And-Or hierarchical model learned with latent structural SVM, and it outperforms baseline methods in recognition and part localization on a newly collected dataset.

Physical fluents, a term originally used by Newton [40], refers to time-varying object states in dynamic scenes. In this paper, we are interested in inferring the fluents of vehicles from video. For example, a door (hood, trunk) is open or closed through various actions, light is blinking to turn. Recognizing these fluents has broad applications, yet have received scant attention in the computer vision literature. Car fluent recognition entails a unified framework for car detection, car part localization and part status recognition, which is made difficult by large structural and appearance variations, low resolutions and occlusions. This paper learns a spatial-temporal And-Or hierarchical model to represent car fluents. The learning of this model is formulated under the latent structural SVM framework. Since there are no publicly related dataset, we collect and annotate a car fluent dataset consisting of car videos with diverse fluents. In experiments, the proposed method outperforms several highly related baseline methods in terms of car fluent recognition and car part localization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes