LGAICLMay 1

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference

arXiv:2605.0105817.2h-index: 3
AI Analysis

For practitioners deploying transformer models, LEAP provides a method to combine distillation and early exit for significant inference speedups without architectural changes.

LEAP introduces an auxiliary training objective that reconciles the incompatibility between layer-aligned distillation and convergence-based early exit, enabling efficient transformer inference. LEAP-MiniLM achieves 1.61× wall-clock speedup with 91.9% of samples exiting by layer 7, while standard distilled models achieve zero effective speedup.

Layer-aligned distillation and convergence-based early exit represent two predominant computational efficiency paradigms for transformer inference; yet we establish that they exhibit systematic incompatibility under standard deployment conditions for convergence-based early exit. Distillation objectives that align intermediate student layers to teacher representations suppress the representational convergence that early-exit mechanisms exploit, rendering such mechanisms ineffective on distilled models. We introduce LEAP (Layer-wise Exit-Aware Pretraining), an auxiliary training objective that reconciles this incompatibility. LEAP requires no architectural modifications; it augments standard distillation with a single constraint ensuring intermediate layers approximate final-layer representations. LEAP-MiniLM achieves 1.61$\times$ measured wall-clock speedup (batch=1, NVIDIA L4) at $θ$=0.95, with 91.9% of samples exiting by layer 7 and 1.80$\times$ theoretical layer reduction, where standard distilled models achieve zero effective speedup. We validate across sentence similarity (STS-B: 0.760 $\pm$ 0.006) and retrieval benchmarks (BEIR), providing operational guidance including latency measurements, decision thresholds, and deployment criteria.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes