CV LGJun 7, 2021

CDN-MEDAL: Two-stage Density and Difference Approximation Framework for Motion Analysis

Synh Viet-Uyen Ha, Cuong Tien Nguyen, Hung Ngoc Phan, Nhat Minh Chung, Phuong Hoai Ha

arXiv:2106.03776v41.4

Originality Incremental advance

AI Analysis

This work addresses motion analysis for video surveillance applications, representing an incremental improvement by hybridizing statistical and deep learning approaches.

The paper tackles the problem of background modeling and subtraction for video surveillance by proposing a two-stage framework combining Gaussian Mixture Models and convolutional neural networks, resulting in an efficient method with rapid convergence and effective extraction of moving objects in unseen cases.

Background modeling and subtraction is a promising research area with a variety of applications for video surveillance. Recent years have witnessed a proliferation of effective learning-based deep neural networks in this area. However, the techniques have only provided limited descriptions of scenes' properties while requiring heavy computations, as their single-valued mapping functions are learned to approximate the temporal conditional averages of observed target backgrounds and foregrounds. On the other hand, statistical learning in imagery domains has been a prevalent approach with high adaptation to dynamic context transformation, notably using Gaussian Mixture Models (GMM) with its generalization capabilities. By leveraging both, we propose a novel method called CDN-MEDAL-net for background modeling and subtraction with two convolutional neural networks. The first architecture, CDN-GM, is grounded on an unsupervised GMM statistical learning strategy to describe observed scenes' salient features. The second one, MEDAL-net, implements a light-weighted pipeline of online video background subtraction. Our two-stage architecture is small, but it is very effective with rapid convergence to representations of intricate motion patterns. Our experiments show that the proposed approach is not only capable of effectively extracting regions of moving objects in unseen cases, but it is also very efficient.

View on arXiv PDF

Similar