Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets
This work addresses the problem of reducing computational costs for transfer learning in computer vision, particularly for ConvNets, offering a parameter-efficient solution that is incremental but broadly applicable across domains.
The paper tackles the under-studied effectiveness of parameter efficient tuning (PET) methods for large-scale ConvNets in computer vision by proposing Conv-Adapter, a lightweight module that achieves comparable or superior performance to full fine-tuning on 23 classification tasks with only 3.5% of the parameters of ResNet50, and generalizes to detection and segmentation with over 50% parameter reduction.
While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.