CVApr 16, 2024

OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery

arXiv:2404.10865v11 citationsh-index: 14BigData
Originality Highly original
AI Analysis

This addresses the need for object detectors to handle unknown objects in real-world applications, offering a modular solution with significant performance gains.

The paper tackles the problem of detecting novel objects in open-world deployments by introducing the Open-Set Object Detection and Discovery (OSODD) task and proposing the OSR-ViT framework, which combines a class-agnostic proposal network with a ViT-based classifier and achieves performance far exceeding state-of-the-art supervised methods, including in low-data settings.

An object detector's ability to detect and flag \textit{novel} objects during open-world deployments is critical for many real-world applications. Unfortunately, much of the work in open object detection today is disjointed and fails to adequately address applications that prioritize unknown object recall \textit{in addition to} known-class accuracy. To close this gap, we present a new task called Open-Set Object Detection and Discovery (OSODD) and as a solution propose the Open-Set Regions with ViT features (OSR-ViT) detection framework. OSR-ViT combines a class-agnostic proposal network with a powerful ViT-based classifier. Its modular design simplifies optimization and allows users to easily swap proposal solutions and feature extractors to best suit their application. Using our multifaceted evaluation protocol, we show that OSR-ViT obtains performance levels that far exceed state-of-the-art supervised methods. Our method also excels in low-data settings, outperforming supervised baselines using a fraction of the training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes