LGJun 21, 2025

EQuARX: Efficient Quantized AllReduce in XLA for Distributed Machine Learning Acceleration

arXiv:2506.17615v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses communication bottlenecks in distributed machine learning for large language models, offering a domain-specific optimization that is incremental but provides concrete speedups.

The paper tackles the performance overhead of inter-device communication in distributed large language models by introducing EQuARX, a native dynamic block-wise quantized AllReduce in XLA for TPUs, achieving up to 1.8X speedup over baseline BF16 AllReduce and accelerating prefill stages of Gemma 3 models by 1.1X to 1.25X with minimal quality impact.

While Large Language Models (LLMs) have become highly influential, their enormous scale presents significant deployment challenges. Efficiently serving these models typically requires distributing them across numerous accelerator devices, which introduces substantial performance overhead from inter-device communication (collectives). While model quantization has been widely adopted to reduce the memory and compute requirements of LLM weights and activations with minimal quality impact, applying quantization directly to collectives like AllReduce is inherently difficult due to the inter-device summation involved, which can lead to numerical instability or significant error accumulation. In this work, we present a native dynamic block-wise efficient quantized AllReduce within the XLA compiler for TPUs (EQuARX). By using TPU-friendly quantization and deep pipelining of communication and compute, EQuARX with int8 precision achieves a 1.8X speedup over baseline BF16 AllReduce across various network topologies. Furthermore, EQuARX accelerates the prefill stage of Gemma 3 27B by 1.25X and Gemma 3 12B by 1.1X, respectively, with small to negligible impact on quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes