CVJul 14, 2022

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

arXiv:2207.07106v220.843 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This addresses the need for unbiased and efficient benchmarks in computer vision to evaluate general-purpose visual representations, though it is incremental as it builds on existing contrastive learning and benchmarking approaches.

The paper tackles the problem of evaluating omni-vision representations that generalize across many visual domains by proposing OmniBenchmark, a benchmark with 21 realm-wise datasets covering 7,372 concepts and 1,074,346 images without semantic overlapping, and introduces ReCo, a supervised contrastive learning framework that improves representation by pulling instances from the same realm closer, showing superior performance to other methods.

Though impressive performance has been achieved in specific visual realms (e.g. faces, dogs, and places), an omni-vision representation generalizing to many natural visual domains is highly desirable. But, existing benchmarks are biased and inefficient to evaluate the omni-vision representation -- these benchmarks either only include several specific realms, or cover most realms at the expense of subsuming numerous datasets that have extensive realm overlapping. In this paper, we propose Omni-Realm Benchmark (OmniBenchmark). It includes 21 realm-wise datasets with 7,372 concepts and 1,074,346 images. Without semantic overlapping, these datasets cover most visual realms comprehensively and meanwhile efficiently. In addition, we propose a new supervised contrastive learning framework, namely Relational Contrastive learning (ReCo), for a better omni-vision representation. Beyond pulling two instances from the same concept closer -- the typical supervised contrastive learning framework -- ReCo also pulls two instances from the same semantic realm closer, encoding the semantic relation between concepts, and facilitating omni-vision representation learning. We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark. We illustrate the superior of ReCo to other supervised contrastive learning methods and reveal multiple practical observations to facilitate future research.

View on arXiv PDF Code

Similar