HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging
This addresses the storage challenge for resource-constrained environments like mobile devices, offering a tunable trade-off between efficiency and performance, though it is incremental over existing merging methods.
The paper tackles the memory inefficiency of storing separate adapters for each task in large language models by introducing HydraOpt, a model merging technique that reduces storage size by 48% while maintaining competitive performance with only a 0.2-1.8% drop.
Large language models (LLMs) often leverage adapters, such as low-rank-based adapters, to achieve strong performance on downstream tasks. However, storing a separate adapter for each task significantly increases memory requirements, posing a challenge for resource-constrained environments such as mobile devices. Although model merging techniques can reduce storage costs, they typically result in substantial performance degradation. In this work, we introduce HydraOpt, a new model merging technique that capitalizes on the inherent similarities between the matrices of low-rank adapters. Unlike existing methods that produce a fixed trade-off between storage size and performance, HydraOpt allows us to navigate this spectrum of efficiency and performance. Our experiments show that HydraOpt significantly reduces storage size (48% reduction) compared to storing all adapters, while achieving competitive performance (0.2-1.8% drop). Furthermore, it outperforms existing merging techniques in terms of performance at the same or slightly worse storage efficiency.