CEAIApr 19, 2024

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

arXiv:2404.13028v14 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of keeping LLMs current and efficient in real-world applications, though it appears incremental as it builds on existing pre-training methods.

The paper tackles the problem of catastrophic forgetting and double descent in continued pre-training of large language models (LLMs) by introducing the LLM-ADE framework, which uses dynamic architectural adjustments to achieve significant performance improvements on general knowledge benchmarks.

This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preserving previously acquired knowledge. We demonstrate LLM-ADE's effectiveness on the TinyLlama model across various general knowledge benchmarks, showing significant performance improvements without the drawbacks of traditional continuous training methods. This approach promises a more versatile and robust way to keep LLMs current and efficient in real-world applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes