LGDec 5, 2024

Approximate Top-$k$ for Increased Parallelism

Oscar Key, Luka Ribar, Alberto Cattaneo, Luke Hudlass-Galley, Douglas Orr

arXiv:2412.04358v110.46 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses a bottleneck in sparsity algorithms for language models, enabling more efficient computation on parallel hardware, though it is incremental as it builds on existing approximate methods.

The paper tackles the problem of limited parallelism in exact top-k computations by evaluating bucketed approximate top-k algorithms, showing they can dramatically increase parallelism for machine learning accelerators, with a fast PyTorch implementation released.

We present an evaluation of bucketed approximate top-$k$ algorithms. Computing top-$k$ exactly suffers from limited parallelism, because the $k$ largest values must be aggregated along the vector, thus is not well suited to computation on highly-parallel machine learning accelerators. By relaxing the requirement that the top-$k$ is exact, bucketed algorithms can dramatically increase the parallelism available by independently computing many smaller top-$k$ operations. We explore the design choices of this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top-$k$ to select the most important parameters or activations. We also release a fast bucketed top-$k$ implementation for PyTorch.

View on arXiv PDF

Similar