Multi-View Task-Driven Recognition in Visual Sensor Networks
This work addresses bandwidth constraints in visual sensor networks for applications like surveillance, but it is incremental as it builds on existing sparse coding and multi-task learning techniques.
The paper tackles the problem of efficient visual analysis in bandwidth-limited distributed camera networks by proposing a multi-view task-driven learning method that exploits camera positions as regularization in sparse coding, and demonstrates that it outperforms state-of-the-art methods in surveillance recognition tasks.
Nowadays, distributed smart cameras are deployed for a wide set of tasks in several application scenarios, ranging from object recognition, image retrieval, and forensic applications. Due to limited bandwidth in distributed systems, efficient coding of local visual features has in fact been an active topic of research. In this paper, we propose a novel approach to obtain a compact representation of high-dimensional visual data using sensor fusion techniques. We convert the problem of visual analysis in resource-limited scenarios to a multi-view representation learning, and we show that the key to finding properly compressed representation is to exploit the position of cameras with respect to each other as a norm-based regularization in the particular signal representation of sparse coding. Learning the representation of each camera is viewed as an individual task and a multi-task learning with joint sparsity for all nodes is employed. The proposed representation learning scheme is referred to as the multi-view task-driven learning for visual sensor network (MT-VSN). We demonstrate that MT-VSN outperforms state-of-the-art in various surveillance recognition tasks.