Structured Pruning Adapters
This addresses the issue of inference speed for users of parameter-efficient fine-tuning in computer vision, though it is incremental as it builds on existing adapter and pruning methods.
The paper tackles the problem of slow inference in adapted models by proposing Structured Pruning Adapters (SPAs), which accelerate and specialize networks using structured pruning, achieving a 6.9% average accuracy improvement with half the parameters at 90% pruning.
Adapters are a parameter-efficient alternative to fine-tuning, which augment a frozen base network to learn new tasks. Yet, the inference of the adapted model is often slower than the corresponding fine-tuned model. To improve on this, we propose Structured Pruning Adapters (SPAs), a family of compressing, task-switching network adapters, that accelerate and specialize networks using tiny parameter sets and structured pruning. Specifically, we propose a channel-based SPA and evaluate it with a suite of pruning methods on multiple computer vision benchmarks. Compared to regular structured pruning with fine-tuning, our channel-SPAs improve accuracy by 6.9% on average while using half the parameters at 90% pruned weights. Alternatively, they can learn adaptations with 17x fewer parameters at 70% pruning with 1.6% lower accuracy. Similarly, our block-SPA requires far fewer parameters than pruning with fine-tuning. Our experimental code and Python library of adapters are available at github.com/lukashedegaard/structured-pruning-adapters.