CVAIDec 28, 2024

Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems

arXiv:2412.20201v22 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and explainable anomaly detection on edge devices in smart cities, though it appears incremental as it builds on existing weakly supervised and multimodal techniques.

The paper tackles the problem of real-time and interpretable anomaly detection for smart city monitoring by proposing TCVADS, a two-stage system that uses knowledge distillation and cross-modal contrastive learning, achieving significant improvements in performance, efficiency, and interpretability over existing methods.

Weakly Supervised Monitoring Anomaly Detection (WSMAD) utilizes weak supervision learning to identify anomalies, a critical task for smart city monitoring. However, existing multimodal approaches often fail to meet the real-time and interpretability requirements of edge devices due to their complexity. This paper presents TCVADS (Two-stage Cross-modal Video Anomaly Detection System), which leverages knowledge distillation and cross-modal contrastive learning to enable efficient, accurate, and interpretable anomaly detection on edge devices.TCVADS operates in two stages: coarse-grained rapid classification and fine-grained detailed analysis. In the first stage, TCVADS extracts features from video frames and inputs them into a time series analysis module, which acts as the teacher model. Insights are then transferred via knowledge distillation to a simplified convolutional network (student model) for binary classification. Upon detecting an anomaly, the second stage is triggered, employing a fine-grained multi-class classification model. This stage uses CLIP for cross-modal contrastive learning with text and images, enhancing interpretability and achieving refined classification through specially designed triplet textual relationships. Experimental results demonstrate that TCVADS significantly outperforms existing methods in model performance, detection efficiency, and interpretability, offering valuable contributions to smart city monitoring applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes