CVLGDec 22, 2022

GOOD: Exploring Geometric Cues for Detecting Objects in an Open World

arXiv:2212.11720v314 citationsh-index: 94
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting novel objects in images for computer vision applications, offering a significant but incremental advance over existing RGB-based models.

The paper tackles the problem of open-world class-agnostic object detection by incorporating geometric cues like depth and normals to reduce overfitting to training classes, resulting in a 5.0% AR@100 improvement over state-of-the-art methods on COCO with only one training class.

We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single "person" class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes