CVJul 22, 2022

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

arXiv:2207.10895v238 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses the challenge of overfitting in multi-view pedestrian detection due to limited annotated data, which is important for applications like surveillance and autonomous driving, though it is incremental as it builds on existing deep-learning methods.

The paper tackles the problem of pedestrian detection under heavy occlusions in multi-camera systems by proposing a data augmentation method that generates 3D cylinder occlusions and projects features across multiple heights, resulting in greatly improved performance compared to state-of-the-art methods.

Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes