CVJul 19, 2024

Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

arXiv:2407.14302v24 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in PETL for vision tasks, offering a practical solution for resource-constrained applications, though it is incremental as it builds on existing PETL methods.

The paper tackles the problem of high computational complexity and inference burden in parameter-efficient transfer learning (PETL) for visual recognition by proposing Dyn-Adapter, which disentangles features at multiple levels to reduce FLOPs by 50% while maintaining or improving recognition accuracy.

Parameter-efficient transfer learning (PETL) is a promising task, aiming to adapt the large-scale pre-trained model to downstream tasks with a relatively modest cost. However, current PETL methods struggle in compressing computational complexity and bear a heavy inference burden due to the complete forward process. This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter), that boosts PETL efficiency by subtly disentangling features in multiple levels. Our approach is simple: first, we devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy. Second, we introduce a bidirectional sparsity strategy driven by the pursuit of powerful generalization ability. These qualities enable us to fine-tune efficiently and effectively: we reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy. Extensive experiments on diverse datasets and pretrained backbones demonstrate the potential of Dyn-Adapter serving as a general efficiency booster for PETL in vision recognition tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes