CLAILGNov 11, 2024

LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models

arXiv:2411.06839v25 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses model compression for LLMs, offering a more efficient distillation method, but it is incremental as it builds on existing KD and LoRA techniques.

The paper tackles the problem of compressing large language models (LLMs) by proposing LLM-NEO, a parameter-efficient knowledge distillation method that integrates Low-Rank Adaptation (LoRA) to improve knowledge transfer efficiency, and it shows experimental results outperforming baselines on compressing Llama 2 and Llama 3.2.

Knowledge distillation (KD) has been a predominant method for compressing Large Language Models (LLMs). In this paper, we first revisit KD and Low-Rank Adaption (LoRA) and demonstrate that they follow the same paradigm. Inspired by this observation, we propose a parameter-efficient knowledge distillation method, LLM-NEO, which integrates LoRA into KD to improve the efficiency of knowledge transfer. After that, we summarize some valuable guidelines for the hyperparameters in LLM-NEO. Experimental results on compressing Llama 2 and Llama 3.2 show that LLM-NEO outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-NEO on variants of LoRA. The code and trained models are available at [Github](https://github.com/yang3121099/LLM-Neo).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes