CVMar 23, 2020

GeoGraph: Learning graph-based multi-view object detection with geometric cues end-to-end

arXiv:2003.10151v21 citations
AI Analysis

This work addresses the problem of accurately detecting and positioning static urban objects from multiple views for applications like autonomous driving or mapping, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles multi-view object detection and re-identification in urban scenes by proposing an end-to-end Graph Neural Network that integrates geometric cues, resulting in a 2-6% gain in detection and re-ID average precision and an 8x reduction in training time.

In this paper we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object. Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions given images and approximate camera poses as input. Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views. Our method is robust to occlusion, with similar appearance of neighboring objects, and severe changes in viewpoints by jointly reasoning about visual image appearance and relative pose. Experimental evaluation on two challenging, large-scale datasets and comparison with state-of-the-art methods show significant and systematic improvements both in accuracy and efficiency, with 2-6% gain in detection and re-ID average precision as well as 8x reduction of training time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes