LGAICLJun 27, 2024

The Remarkable Robustness of LLMs: Stages of Inference?

arXiv:2406.19384v3128 citations
Originality Incremental advance
AI Analysis

This work provides a framework for interpreting depth-dependent computations in LLMs, which is incremental but offers insights into model robustness for researchers and practitioners.

The study investigated the robustness of Large Language Models (LLMs) to structural interventions like deleting and swapping adjacent layers during inference, finding that models retained 72-95% of their original top-1 prediction accuracy without fine-tuning, with performance degradation most severe in early and final layers.

We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task- and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token predictions; and (4) residual sharpening, where irrelevant features are suppressed to finalize the output distribution. Synthesizing behavioral and mechanistic evidence, we provide a framework for interpreting depth-dependent computations in LLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes