CVNov 29, 2014

Pedestrian Detection aided by Deep Learning Semantic Tasks

arXiv:1412.0069v1439 citations
Originality Incremental advance
AI Analysis

This work addresses pedestrian detection for autonomous driving by improving accuracy through multi-task learning, though it is incremental as it builds on existing deep learning methods.

The paper tackles pedestrian detection by jointly optimizing detection with semantic tasks like pedestrian and scene attributes, transferring scene attributes from segmentation datasets to reduce annotation costs. The approach reduces miss rates by 17% on Caltech and 5.5% on ETH datasets compared to previous deep models.

Deep learning methods have achieved great success in pedestrian detection, owing to its ability to learn features from raw pixels. However, they mainly capture middle-level representations, such as pose of pedestrian, but confuse positive with hard negative samples, which have large ambiguity, e.g. the shape and appearance of `tree trunk' or `wire pole' are similar to pedestrian in certain viewpoint. This ambiguity can be distinguished by high-level representation. To this end, this work jointly optimizes pedestrian detection with semantic tasks, including pedestrian attributes (e.g. `carrying backpack') and scene attributes (e.g. `road', `tree', and `horizontal'). Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources. Since distinct tasks have distinct convergence rates and data from different datasets have different distributions, a multi-task objective function is carefully designed to coordinate tasks and reduce discrepancies among datasets. The importance coefficients of tasks and network parameters in this objective function can be iteratively estimated. Extensive evaluations show that the proposed approach outperforms the state-of-the-art on the challenging Caltech and ETH datasets, where it reduces the miss rates of previous deep models by 17 and 5.5 percent, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes