CVMar 12, 2023

Universal Instance Perception as Object Discovery and Retrieval

arXiv:2303.06674v2251 citationsh-index: 105Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of fragmented instance perception tasks for computer vision researchers and practitioners, offering a parameter-efficient solution that leverages multi-task data, though it is incremental in unifying existing paradigms.

The authors tackled the fragmentation of instance perception tasks by proposing UNINEXT, a unified model that reformulates these tasks into object discovery and retrieval, achieving superior performance on 20 benchmarks across 10 tasks.

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks. In this work, we present a universal instance perception model of the next generation, termed UNINEXT. UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts. This unified formulation brings the following benefits: (1) enormous data from different tasks and label vocabularies can be exploited for jointly training general instance-level representations, which is especially beneficial for tasks lacking in training data. (2) the unified model is parameter-efficient and can save redundant computation when handling multiple tasks simultaneously. UNINEXT shows superior performance on 20 challenging benchmarks from 10 instance-level tasks including classical image-level tasks (object detection and instance segmentation), vision-and-language tasks (referring expression comprehension and segmentation), and six video-level object tracking tasks. Code is available at https://github.com/MasterBin-IIAU/UNINEXT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes