CL AIOct 14, 2024

Scalable Multi-Domain Adaptation of Language Models using Modular Experts

Peter Schafhalter, Shun Liao, Yanqi Zhou, Chih-Kuan Yeh, Arun Kandoor, James Laudon

arXiv:2410.10181v23.45 citationsh-index: 28

Originality Highly original

AI Analysis

This addresses the problem of scalable and efficient domain adaptation for language models, particularly in resource-constrained settings like edge devices, representing a novel method for a known bottleneck.

The paper tackles the challenge of adapting pre-trained language models to multiple domains efficiently while balancing performance, knowledge retention, and computational costs, proposing Modular Domain Experts (MoDE) which achieves comparable target performance to full fine-tuning with 1.65% better retention and up to 38% faster training.

Domain-specific adaptation is critical to maximizing the performance of pre-trained language models (PLMs) on one or multiple targeted tasks, especially under resource-constrained use cases, such as edge devices. However, existing methods often struggle to balance domain-specific performance, retention of general knowledge, and efficiency for training and inference. To address these challenges, we propose Modular Domain Experts (MoDE). MoDE is a mixture-of-experts architecture that augments a general PLMs with modular, domain-specialized experts. These experts are trained independently and composed together via a lightweight training process. In contrast to standard low-rank adaptation methods, each MoDE expert consists of several transformer layers which scale better with more training examples and larger parameter counts. Our evaluation demonstrates that MoDE achieves comparable target performances to full parameter fine-tuning while achieving 1.65% better retention performance. Moreover, MoDE's architecture enables flexible sharding configurations and improves training speeds by up to 38% over state-of-the-art distributed training configurations.

View on arXiv PDF

Similar