CLAILGJun 3, 2025

Hopscotch: Discovering and Skipping Redundancies in Language Models

arXiv:2506.03303v21 citationsh-index: 5EMNLP
Originality Incremental advance
AI Analysis

This addresses efficiency issues in large language models for users needing faster inference with minimal accuracy loss, though it is incremental as it builds on existing model compression techniques.

The paper tackles the problem of redundant attention blocks in causal language models by proposing Hopscotch, a method that identifies and skips the least important blocks while adapting to preserve output quality, achieving less than a 2% performance drop when skipping four blocks in models like Llama-3.1-8B and Qwen2.5-7B.

Modern causal language models stack many attention blocks to improve performance, but not all blocks are necessary for every task. We propose Hopscotch, a simple yet effective method that identifies and skips attention blocks with least contributions to a task and adapts to preserve output quality. Hopscotch jointly optimizes which blocks to skip and how to scale the outputs of the remaining layers. By introducing lightweight, trainable scaling parameters to attention and MLP blocks, it mitigates distribution shifts in hidden states caused by removing attention blocks. Hopscotch does not modify model weights or require access to pretraining or instruction-tuning data, and is compatible with existing model compression techniques. When applied to $\texttt{Llama-3.1-8B}$ and $\texttt{Qwen2.5-7B}$, Hopscotch achieves less than a 2% drop in performance even after skipping four attention blocks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes