CVOct 30, 2023

Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers

arXiv:2310.19447v1
Originality Highly original
AI Analysis

This addresses the problem of accurate group detection for public safety and smart cities in crowded, occluded environments, representing a strong specific gain rather than a foundational advancement.

The paper tackles group detection in large-scale scenes with frequent occlusions by proposing GroupTransformer, an end-to-end framework that uses an occlusion encoder and spatio-temporal transformers, achieving over 10% improvement in precision and F1 score on large-scale scenes and over 5% on small-scale scenes compared to state-of-the-art methods.

Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework,GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. The project page with code can be found at http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes