LGAICLFLFeb 24, 2025

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

arXiv:2502.17535v116 citationsh-index: 14
Originality Incremental advance
AI Analysis

This identifies a critical gap in LLM compression research that could affect developers and researchers working with resource-constrained deployments.

The paper argues that current LLM compression methods focus too narrowly on preserving basic task performance, overlooking essential capabilities like multi-step reasoning and tool use that modern LLMs rely on. It proposes the 'lottery LLM hypothesis' suggesting smaller compressed models can match original performance when augmented with reasoning and external tools.

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes