CVFeb 8, 2024

Enhancing Zero-shot Counting via Language-guided Exemplar Learning

Mingjie Wang, Jun Zhou, Yong Dai, Eric Buys, Minglun Gong

arXiv:2402.05394v13.72 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the problem of counting arbitrary objects without class-specific training for applications in computer vision, representing an incremental improvement by integrating language models into counting tasks.

The paper tackles the Class-Agnostic Counting problem by proposing ExpressCount, a method that uses language-guided exemplar learning to enhance zero-shot object counting, achieving state-of-the-art performance with accuracy comparable to some Category-Specific Counting models.

Recently, Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency compared to Category-Specific Counting (CSC). This paper proposes a novel ExpressCount to enhance zero-shot object counting by delving deeply into language-guided exemplar learning. Specifically, the ExpressCount is comprised of an innovative Language-oriented Exemplar Perceptron and a downstream visual Zero-shot Counting pipeline. Thereinto, the perceptron hammers at exploiting accurate exemplar cues from collaborative language-vision signals by inheriting rich semantic priors from the prevailing pre-trained Large Language Models (LLMs), whereas the counting pipeline excels in mining fine-grained features through dual-branch and cross-attention schemes, contributing to the high-quality similarity learning. Apart from building a bridge between the LLM in vogue and the visual counting tasks, expression-guided exemplar estimation significantly advances zero-shot learning capabilities for counting instances with arbitrary classes. Moreover, devising a FSC-147-Express with annotations of meticulous linguistic expressions pioneers a new venue for developing and validating language-based counting models. Extensive experiments demonstrate the state-of-the-art performance of our ExpressCount, even showcasing the accuracy on par with partial CSC models.

View on arXiv PDF

Similar