LGAIMay 7

Budgeted Attention Allocation: Cost-Conditioned Compute Control for Efficient Transformers

arXiv:2605.0569742.0h-index: 2
AI Analysis

For practitioners needing multiple cost-quality trade-offs from a single model, this provides a feasibility study of controllable attention budgets, though it does not universally outperform fixed-budget specialists.

This paper introduces Budgeted Attention Allocation, a method for controlling inference cost in transformers by conditioning attention head gating on a requested budget. On AG News with BERT-Mini, it achieves 87.6% accuracy with 1.20x speedup at budget 0.50, and on DBpedia14, 97.4% accuracy at budget 0.50 vs 96.6% for dense attention.

Transformers usually expose one inference cost per trained model, while deployed systems often need multiple cost-quality operating points. We study Budgeted Attention Allocation, a monotone head-gating mechanism conditioned on a requested attention budget. Dense warm-starting is important for stability: on a robust synthetic sequence task, one budgeted model reaches 99.7% accuracy at 0.303 estimated attention cost and 100.0% accuracy at 0.504 cost. On held-out AG News with a custom word-level transformer, hard-gate adaptation turns soft cost control into measured single-thread CPU speed, reaching 82.1% accuracy with 1.28x speedup at budget 0.50. In pretrained BERT-Mini AG News, budgeted structural pruning reaches 87.6% accuracy with 1.20x speedup at budget 0.50; a validation-ranked zero-shot dense post-hoc structural baseline reaches 86.1%, and one recovery epoch raises that per-budget specialist to 87.9%. On DBpedia14, BERT-Mini budgeted gates reach 97.4% at exact budget 0.50 versus 96.6% for dense full attention. Static fixed-budget gates and recovered dense specialists remain strong. The contribution is therefore not universal dominance, but a reproducible feasibility study of one controllable checkpoint across budgets that can trade attention cost for accuracy and be converted into measured structural speedups on small CPU benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes