Ano-Graph: Learning Normal Scene Contextual Graphs to Detect Video Anomalies
It addresses the problem of detecting anomalies in videos for surveillance and security applications, offering an incremental improvement by modeling object interactions more effectively than prior methods.
The paper tackles video anomaly detection by learning normal object interactions via a spatio-temporal graph, achieving state-of-the-art performance on datasets like ADOC and Street Scene with significant robustness to real-world variations.
Video anomaly detection has proved to be a challenging task owing to its unsupervised training procedure and high spatio-temporal complexity existing in real-world scenarios. In the absence of anomalous training samples, state-of-the-art methods try to extract features that fully grasp normal behaviors in both space and time domains using different approaches such as autoencoders, or generative adversarial networks. However, these approaches completely ignore or, by using the ability of deep networks in the hierarchical modeling, poorly model the spatio-temporal interactions that exist between objects. To address this issue, we propose a novel yet efficient method named Ano-Graph for learning and modeling the interaction of normal objects. Towards this end, a Spatio-Temporal Graph (STG) is made by considering each node as an object's feature extracted from a real-time off-the-shelf object detector, and edges are made based on their interactions. After that, a self-supervised learning method is employed on the STG in such a way that encapsulates interactions in a semantic space. Our method is data-efficient, significantly more robust against common real-world variations such as illumination, and passes SOTA by a large margin on the challenging datasets ADOC and Street Scene while stays competitive on Avenue, ShanghaiTech, and UCSD.