CVLGIVMar 16, 2020

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

arXiv:2003.07080v139 citations
AI Analysis

This addresses the challenge of detecting heavily occluded humans in crowded scenes for computer vision applications, representing an incremental advancement over existing methods.

The paper tackles the problem of detecting human bodies in crowded scenes by introducing PS-RCNN, a two-stage detector that suppresses primary objects to highlight occluded instances, resulting in significant improvements such as a 4.49% increase in recall and 2.92% increase in AP on the CrowdHuman dataset.

Detecting human bodies in highly crowded scenes is a challenging problem. Two main reasons result in such a problem: 1). weak visual cues of heavily occluded instances can hardly provide sufficient information for accurate detection; 2). heavily occluded instances are easier to be suppressed by Non-Maximum-Suppression (NMS). To address these two issues, we introduce a variant of two-stage detectors called PS-RCNN. PS-RCNN first detects slightly/none occluded objects by an R-CNN module (referred as P-RCNN), and then suppress the detected instances by human-shaped masks so that the features of heavily occluded instances can stand out. After that, PS-RCNN utilizes another R-CNN module specialized in heavily occluded human detection (referred as S-RCNN) to detect the rest missed objects by P-RCNN. Final results are the ensemble of the outputs from these two R-CNNs. Moreover, we introduce a High Resolution RoI Align (HRRA) module to retain as much of fine-grained features of visible parts of the heavily occluded humans as possible. Our PS-RCNN significantly improves recall and AP by 4.49% and 2.92% respectively on CrowdHuman, compared to the baseline. Similar improvements on Widerperson are also achieved by the PS-RCNN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes