LG AIOct 17, 2023

Unlocking Emergent Modularity in Large Language Models

arXiv:2310.10908v226.137 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing generalization in language models for AI researchers and practitioners, though it is incremental as it builds on existing concepts of emergent modularity.

The paper tackles the underutilization of emergent modularity in large language models by fine-tuning them as Mixture-of-Experts counterparts without extra parameters, resulting in improved in-domain and out-of-domain generalization, as demonstrated with models like Llama2-7B and Llama-30B.

Modular Neural Networks (MNNs) demonstrate various advantages over monolithic models. Existing MNNs are generally $\textit{explicit}$: their modular architectures are pre-defined, with individual modules expected to implement distinct functions. Recent works reveal that there exists $\textit{implicit}$ modularity in standard pre-trained transformers, namely $\textit{Emergent Modularity}$. They indicate that such modular structures spontaneously exhibit during the early pre-training phase. Despite the benefits of modularity, most Language Models (LMs) are still treated as monolithic models in the pre-train and fine-tune paradigm, with their emergent modularity locked and underutilized. In this work, focusing on unlocking the emergent modularity in LMs, we showcase that standard LMs could be fine-tuned as their Mixture-of-Expert (MoEs) counterparts without introducing any extra parameters. Such MoEs are derived from emergent modularity and are referred to as Emergent MoEs (EMoE). Our experiments demonstrate that fine-tuning EMoE effectively improves downstream in-domain and out-of-domain generalization compared with vanilla fine-tuning. Our analysis and ablation studies further illustrate that it is robust to various configurations and can scale up to Large Language Models (i.e., Llama2-7B and Llama-30B). Code is available at https://github.com/qiuzh20/EMoE.

View on arXiv PDF Code

Similar