LG AIMar 20, 2025

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

Daniel Haziza, Timothy Chou, Dhruv Choudhary, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut, Jesse Cai

arXiv:2503.16672v118.810 citationsh-index: 19

Originality Incremental advance

AI Analysis

This work addresses efficiency challenges in large language models for AI practitioners, though it is incremental as it builds on existing hardware-accelerated sparsity techniques.

The paper tackled the problem of accelerating large language model training and inference by leveraging 2:4 sparsity patterns in activations, achieving up to 1.3x faster Feed Forward Networks with no accuracy loss.

In this paper, we demonstrate how to leverage 2:4 sparsity, a popular hardware-accelerated GPU sparsity pattern, to activations to accelerate large language model training and inference. Crucially we exploit the intrinsic sparsity found in Squared-ReLU activations to provide this acceleration with no accuracy loss. Our approach achieves up to 1.3x faster Feed Forward Network (FFNs) in both the forwards and backwards pass. This work highlights the potential for sparsity to play a key role in accelerating large language model training and inference.

View on arXiv PDF

Similar