Simultaneous multi-view instance detection with learned geometric soft-constraints
This work addresses robust cross-view object detection for urban scenes, providing a new dataset and method, but it is incremental as it builds on existing multi-view and detection techniques.
The paper tackles the problem of multi-view object instance detection under challenging conditions like viewpoint changes and lighting variations by jointly learning geometry and appearance across views, achieving superior performance on a new large dataset of street-level panoramas.
We propose to jointly learn multi-view geometry and warping between views of the same object instances for robust cross-view object detection. What makes multi-view object instance detection difficult are strong changes in viewpoint, lighting conditions, high similarity of neighbouring objects, and strong variability in scale. By turning object detection and instance re-identification in different views into a joint learning task, we are able to incorporate both image appearance and geometric soft constraints into a single, multi-view detection process that is learnable end-to-end. We validate our method on a new, large data set of street-level panoramas of urban objects and show superior performance compared to various baselines. Our contribution is threefold: a large-scale, publicly available data set for multi-view instance detection and re-identification; an annotation tool custom-tailored for multi-view instance detection; and a novel, holistic multi-view instance detection and re-identification method that jointly models geometry and appearance across views.