LGSep 16, 2025

HAM: Hierarchical Adapter Merging for Scalable Continual Learning

Eric Nuertey Coleman, Luigi Quarantiello, Samrat Mukherjee, Julio Hurtado, Vincenzo Lomonaco

arXiv:2509.13211v37.11 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the challenge of scaling continual learning for dynamic scenarios, offering improved efficiency over existing methods, though it is incremental in building on Parameter-Efficient Fine-Tuning approaches.

The paper tackles the problem of catastrophic forgetting in continual learning by introducing Hierarchical Adapters Merging (HAM), a framework that dynamically combines adapters from different tasks, and it significantly outperforms state-of-the-art methods on three vision benchmarks, especially as task numbers increase.

Continual learning is an essential capability of human cognition, yet it poses significant challenges for current deep learning models. The primary issue is that new knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new, a phenomenon known as catastrophic forgetting. Although large pre-trained models can partially mitigate forgetting by leveraging their existing knowledge and over-parameterization, they often struggle when confronted with novel data distributions. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, enable efficient adaptation to new knowledge. However, they still face challenges in scaling to dynamic learning scenarios and long sequences of tasks, as maintaining one adapter per task introduces complexity and increases the potential for interference. In this paper, we introduce Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training. This approach enables HAM to scale effectively, allowing it to manage more tasks than competing baselines with improved efficiency. To achieve this, HAM maintains a fixed set of groups that hierarchically consolidate new adapters. For each task, HAM trains a low-rank adapter along with an importance scalar, then dynamically groups tasks based on adapter similarity. Within each group, adapters are pruned, scaled and merge, facilitating transfer learning between related tasks. Extensive experiments on three vision benchmarks show that HAM significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.

View on arXiv PDF

Similar