CVJul 19, 2017

Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model

M. Saquib Sarfraz, Arne Schumann, Yan Wang, Rainer Stiefelhagen

arXiv:1707.06089v112.3100 citations

Originality Incremental advance

AI Analysis

This work addresses a demanding problem in visual surveillance for person retrieval and indexing, but it is incremental as it builds on existing multi-label classification approaches.

The paper tackles pedestrian attribute inference by proposing an end-to-end model that jointly predicts coarse pose (view) and view-specific attributes, showing improved attribute predictions. It achieves competitive performance and improves on state-of-the-art across three datasets (PETA, RAP, WIDER).

Pedestrian attribute inference is a demanding problem in visual surveillance that can facilitate person retrieval, search and indexing. To exploit semantic relations between attributes, recent research treats it as a multi-label image classification task. The visual cues hinting at attributes can be strongly localized and inference of person attributes such as hair, backpack, shorts, etc., are highly dependent on the acquired view of the pedestrian. In this paper we assert this dependence in an end-to-end learning framework and show that a view-sensitive attribute inference is able to learn better attribute predictions. Our proposed model jointly predicts the coarse pose (view) of the pedestrian and learns specialized view-specific multi-label attribute predictions. We show in an extensive evaluation on three challenging datasets (PETA, RAP and WIDER) that our proposed end-to-end view-aware attribute prediction model provides competitive performance and improves on the published state-of-the-art on these datasets.

View on arXiv PDF

Similar