SAM-RCNN: Scale-Aware Multi-Resolution Multi-Channel Pedestrian Detection
This work addresses pedestrian detection for autonomous driving and surveillance, offering incremental improvements over existing multi-layer feature aggregation methods.
The paper tackles the problem of pedestrian detection across varying scales by proposing a Scale-Aware Multi-resolution (SAM) method that adaptively selects CNN features based on pedestrian size, with an enhanced version (SAM+) incorporating complementary feature channels, achieving superior performance on Caltech and KITTI benchmarks.
Convolutional neural networks (CNN) have enabled significant improvements in pedestrian detection owing to the strong representation ability of the CNN features. Recently, aggregating features from multiple layers of a CNN has been considered as an effective approach, however, the same approach regarding feature representation is used for detecting pedestrians of varying scales. Consequently, it is not guaranteed that the feature representation for pedestrians of a particular scale is optimised. In this paper, we propose a Scale-Aware Multi-resolution (SAM) method for pedestrian detection which can adaptively select multi-resolution convolutional features according to pedestrian sizes. The proposed SAM method extracts the appropriate CNN features that have strong representation ability as well as sufficient feature resolution, given the size of the pedestrian candidate output from a region proposal network. Moreover, we propose an enhanced SAM method, termed as SAM+, which incorporates complementary features channels and achieves further performance improvement. Evaluations on the challenging Caltech and KITTI pedestrian benchmarks demonstrate the superiority of our proposed method.