Feedback Driven Multi Stereo Vision System for Real-Time Event Analysis
This addresses the need for robust scene understanding in interactive systems for applications like gaming or sensitive environments, though it appears incremental as it builds on existing stereo vision techniques.
The paper tackles the problem of unreliable 2D and 3D cameras in large, complex environments by proposing a 3D stereo vision pipeline that fuses multiple cameras for full scene reconstruction, enabling tasks like event recognition and tracking, with preliminary experimentation and results presented.
2D cameras are often used in interactive systems. Other systems like gaming consoles provide more powerful 3D cameras for short range depth sensing. Overall, these cameras are not reliable in large, complex environments. In this work, we propose a 3D stereo vision based pipeline for interactive systems, that is able to handle both ordinary and sensitive applications, through robust scene understanding. We explore the fusion of multiple 3D cameras to do full scene reconstruction, which allows for preforming a wide range of tasks, like event recognition, subject tracking, and notification. Using possible feedback approaches, the system can receive data from the subjects present in the environment, to learn to make better decisions, or to adapt to completely new environments. Throughout the paper, we introduce the pipeline and explain our preliminary experimentation and results. Finally, we draw the roadmap for the next steps that need to be taken, in order to get this pipeline into production