Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection
This addresses the problem of robust people tracking in crowded multi-camera setups for applications like surveillance or crowd analysis, representing an incremental improvement over existing methods.
The paper tackles the problem of multi-camera multi-people tracking in crowded scenes, where performance degrades severely, by introducing an architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model ambiguities, and it outperforms several state-of-art algorithms on challenging scenes.
People detection in single 2D images has improved greatly in recent years. However, comparatively little of this progress has percolated into multi-camera multi-people tracking algorithms, whose performance still degrades severely when scenes become very crowded. In this work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One of its key ingredients are high-order CRF terms that model potential occlusions and give our approach its robustness even when many people are present. Our model is trained end-to-end and we show that it outperforms several state-of-art algorithms on challenging scenes.