4-D Scene Alignment in Surveillance Video
This addresses the need for robust activity detection in surveillance video by providing a practical calibration method, though it appears incremental as it builds on existing techniques.
The paper tackles the problem of automatically calibrating fixed surveillance cameras to understand 3-D scene geometry, combining a CNN-based pose estimator with pedestrian observations to establish 4-D scene alignment without requiring tracking or explicit head/feet detection, resulting in robustness to height variations and estimation errors.
Designing robust activity detectors for fixed camera surveillance video requires knowledge of the 3-D scene. This paper presents an automatic camera calibration process that provides a mechanism to reason about the spatial proximity between objects at different times. It combines a CNN-based camera pose estimator with a vertical scale provided by pedestrian observations to establish the 4-D scene geometry. Unlike some previous methods, the people do not need to be tracked nor do the head and feet need to be explicitly detected. It is robust to individual height variations and camera parameter estimation errors.