CL LGOct 23, 2024

LEGO: Language Model Building Blocks

Shrenik Bhansali, Alwin Jin, Tyler Lizzo, Larry Heck

arXiv:2410.18287v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses cost and privacy problems for NLP practitioners by offering an incremental improvement through model recombination and federated learning.

The paper tackles the high costs and privacy issues of large language models (LLMs) by proposing LEGO, a technique to extract and recombine small language models (SLMs) from an LLM, resulting in efficient, task-specific models that preserve privacy and maintain robustness.

Large language models (LLMs) are essential in natural language processing (NLP) but are costly in data collection, pre-training, fine-tuning, and inference. Task-specific small language models (SLMs) offer a cheaper alternative but lack robustness and generalization. This paper proposes LEGO, a novel technique to extract SLMs from an LLM and recombine them. Using state-of-the-art LLM pruning strategies, we can create task- and user-specific SLM building blocks that are efficient for fine-tuning and inference while also preserving user data privacy. LEGO utilizes Federated Learning and a novel aggregation scheme for the LLM reconstruction, maintaining robustness without high costs and preserving user data privacy. We experimentally demonstrate the versatility of LEGO, showing its ability to enable model heterogeneity and mitigate the effects of data heterogeneity while maintaining LLM robustness.

View on arXiv PDF

Similar