CLAILGOct 10, 2025

Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning

arXiv:2510.13832v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses efficiency challenges in deploying transformer models for NLP tasks, offering a more stable and accurate pruning method, though it is incremental as it builds on existing gradient-based approaches.

The paper tackled the problem of pruning transformer models efficiently without sacrificing accuracy or stability by introducing a new pruning criterion, HIES, which integrates head importance scores with attention entropy. The result was up to 15.2% improvement in model quality and 2.04x improvement in stability over existing methods.

Transformer-based models have achieved remarkable performance in NLP tasks. However, their structural characteristics-multiple layers and attention heads-introduce efficiency challenges in inference and deployment. To address these challenges, various pruning methods have recently been proposed. Notably, gradient-based methods using Head Importance Scores (HIS) have gained traction for interpretability, efficiency, and ability to identify redundant heads. However, HIS alone has limitations as it captures only the gradient-driven contribution, overlooking the diversity of attention patterns. To overcome these limitations, we introduce a novel pruning criterion, HIES (Head Importance-Entropy Score), which integrates head importance scores with attention entropy, providing complementary evidence on per-head contribution. Empirically, HIES-based pruning yields up to 15.2% improvement in model quality and 2.04x improvement in stability over HIS-only methods, enabling substantial model compression without sacrificing either accuracy or stability. Code will be released upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes