LGFeb 4

Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning

arXiv:2602.04491v11 citations
AI Analysis

This work addresses the need for more energy-efficient transformer deployment in Green AI, offering an incremental improvement over existing pruning methods by dynamically updating importance scores.

The paper tackles the problem of static importance scores in attention head pruning for transformer model compression by proposing Greedy-Gnorm, a dynamic algorithm that recalculates head importance using gradient norms after each pruning step, resulting in consistent accuracy preservation under substantial head removal across models like BERT and RoBERTa, outperforming attention entropy.

Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often rely on static importance scores, which fail to capture the evolving role of attention heads during iterative removal. We propose Greedy-Gradient norm (Greedy-Gnorm), a novel head pruning algorithm that dynamically recalculates head importance after each pruning step. Specifically, each head is scored by the elementwise product of the l2-norms of its Q/K/V gradient blocks, as estimated from a hold-out validation set and updated at every greedy iteration. This dynamic approach to scoring mitigates against stale rankings and better reflects gradient-informed importance as pruning progresses. Extensive experiments on BERT, ALBERT, RoBERTa, and XLM-RoBERTa demonstrate that Greedy-Gnorm consistently preserves accuracy under substantial head removal, outperforming attention entropy. By effectively reducing model size while maintaining task performance, Greedy-Gnorm offers a promising step toward more energy-efficient transformer model deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes