NALGOct 26, 2025

On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication

arXiv:2511.00025v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the reliability of deep learning under hardware non-determinism by challenging the stochastic view of numerical noise, though it is incremental in providing empirical analysis rather than a new solution.

The paper tackled the problem of floating-point non-associativity causing non-deterministic noise in GPU matrix multiplication, showing that the common i.i.d. Gaussian noise assumption fails, with empirical results revealing a 0.00% prediction flip rate and nearly 50% of error variance in float16 being structured and correlated.

Floating-point non-associativity makes fundamental deep learning operations, such as matrix multiplication (matmul) on GPUs, inherently non-deterministic. Despite this, the statistical structure of the resulting numerical error remains poorly understood. A common working assumption is that these errors behave as independent and identically distributed (i.i.d.) Gaussian noise. In this paper, we empirically test this assumption and show that it fails to describe real GPU behavior. By comparing outputs of single-input and batched matmuls, we find that while the i.i.d. model predicts non-zero output instability, empirical results show a 0.00% prediction flip rate. Through covariance analysis, we uncover the cause: the floating-point error is structured and highly correlated. For float16, nearly 50% of the total error variance lies in off-diagonal terms, revealing that the noise behaves as a coordinated, directional perturbation rather than random static. This result challenges the prevailing stochastic view of numerical noise and provides a principled foundation for analyzing deep learning reliability under hardware non-determinism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes