MPASNET: Motion Prior-Aware Siamese Network for Unsupervised Deep Crowd Segmentation in Video Scenes
This addresses the problem of obtaining refined pixel-level segmentation for crowded scene analysis without manual labeling, representing a strong specific gain in this domain.
The paper tackles unsupervised crowd segmentation in videos by proposing MPASNET, which eliminates the need for annotations and achieves over 12% higher mIoU than state-of-the-art methods on benchmark datasets.
Crowd segmentation is a fundamental task serving as the basis of crowded scene analysis, and it is highly desirable to obtain refined pixel-level segmentation maps. However, it remains a challenging problem, as existing approaches either require dense pixel-level annotations to train deep learning models or merely produce rough segmentation maps from optical or particle flows with physical models. In this paper, we propose the Motion Prior-Aware Siamese Network (MPASNET) for unsupervised crowd semantic segmentation. This model not only eliminates the need for annotation but also yields high-quality segmentation maps. Specially, we first analyze the coherent motion patterns across the frames and then apply a circular region merging strategy on the collective particles to generate pseudo-labels. Moreover, we equip MPASNET with siamese branches for augmentation-invariant regularization and siamese feature aggregation. Experiments over benchmark datasets indicate that our model outperforms the state-of-the-arts by more than 12% in terms of mIoU.