CVApr 15

Interpretable Human Activity Recognition for Subtle Robbery Detection in Surveillance Videos

arXiv:2604.1432912.8h-index: 3
Predicted impact top 94% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

For surveillance and public safety, this work provides an interpretable, edge-deployable method for detecting a previously underexplored crime type, though it is incremental in combining existing techniques.

The paper tackles automatic detection of subtle snatch-and-run robberies in surveillance videos, achieving real-time performance on an NVIDIA Jetson Nano with promising generalization across scenes.

Non-violent street robberies (snatch-and-run) are difficult to detect automatically because they are brief, subtle, and often indistinguishable from benign human interactions in unconstrained surveillance footage. This paper presents a hybrid, pose-driven approach for detecting snatch-and-run events that combines real-time perception with an interpretable classification stage suitable for edge deployment. The system uses a YOLO-based pose estimator to extract body keypoints for each tracked person and computes kinematic and interaction features describing hand speed, arm extension, proximity, and relative motion between an aggressor-victim pair. A Random Forest classifier is trained on these descriptors, and a temporal hysteresis filter is applied to stabilize frame-level predictions and reduce spurious alarms. We evaluate the method on a staged dataset and on a disjoint test set collected from internet videos, demonstrating promising generalization across different scenes and camera viewpoints. Finally, we implement the complete pipeline on an NVIDIA Jetson Nano and report real-time performance, supporting the feasibility of proactive, on-device robbery detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes