CVSep 18, 2024

Agglomerative Token Clustering

arXiv:2409.11923v112 citationsh-index: 52
Originality Highly original
AI Analysis

This addresses the challenge of efficiently reducing token counts in vision models, particularly at low keep rates, offering a practical improvement for researchers and practitioners in computer vision.

The paper tackles the problem of token merging in vision tasks by introducing Agglomerative Token Clustering (ATC), which achieves state-of-the-art performance across image classification, synthesis, and object detection & segmentation without extra parameters, even matching prior methods without fine-tuning.

We present Agglomerative Token Clustering (ATC), a novel token merging method that consistently outperforms previous token merging and pruning methods across image classification, image synthesis, and object detection & segmentation tasks. ATC merges clusters through bottom-up hierarchical clustering, without the introduction of extra learnable parameters. We find that ATC achieves state-of-the-art performance across all tasks, and can even perform on par with prior state-of-the-art when applied off-the-shelf, i.e. without fine-tuning. ATC is particularly effective when applied with low keep rates, where only a small fraction of tokens are kept and retaining task performance is especially difficult.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes