PFAICLMay 8, 2018

Online normalizer calculation for softmax

arXiv:1805.02867v2182 citations
Originality Incremental advance
AI Analysis

This incremental improvement addresses performance bottlenecks in machine learning models that rely on Softmax, such as attention mechanisms in transformers.

The paper tackled the computational inefficiency of the Softmax function by reducing memory accesses, resulting in a 1.3x speedup for Softmax and up to 5x for fused Softmax+TopK.

The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction in memory accesses should improve Softmax performance on actual hardware. The benchmarks confirm this hypothesis: Softmax accelerates by up to 1.3x and Softmax+TopK combined and fused by up to 5x.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes