CVMar 6, 2023

Weakly Supervised Realtime Dynamic Background Subtraction

arXiv:2303.02857v13 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the challenge of expensive labeling in video surveillance and object tracking by providing an efficient, real-time solution, though it is incremental as it builds on existing weakly supervised methods.

The paper tackles the problem of dynamic background subtraction in computer vision by proposing a weakly supervised framework that eliminates the need for pixel-wise ground-truth labels, achieving results that outperform top-ranked unsupervised methods and are competitive with existing weakly supervised approaches on datasets like CDnet 2014 and I2R.

Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. Dynamic backgrounds poses a significant challenge here. Supervised deep learning-based techniques are currently considered state-of-the-art for this task. However, these methods require pixel-wise ground-truth labels, which can be time-consuming and expensive. In this work, we propose a weakly supervised framework that can perform background subtraction without requiring per-pixel ground-truth labels. Our framework is trained on a moving object-free sequence of images and comprises two networks. The first network is an autoencoder that generates background images and prepares dynamic background images for training the second network. The dynamic background images are obtained by thresholding the background-subtracted images. The second network is a U-Net that uses the same object-free video for training and the dynamic background images as pixel-wise ground-truth labels. During the test phase, the input images are processed by the autoencoder and U-Net, which generate background and dynamic background images, respectively. The dynamic background image helps remove dynamic motion from the background-subtracted image, enabling us to obtain a foreground image that is free of dynamic artifacts. To demonstrate the effectiveness of our method, we conducted experiments on selected categories of the CDnet 2014 dataset and the I2R dataset. Our method outperformed all top-ranked unsupervised methods. We also achieved better results than one of the two existing weakly supervised methods, and our performance was similar to the other. Our proposed method is online, real-time, efficient, and requires minimal frame-level annotation, making it suitable for a wide range of real-world applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes