Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models
For LLM practitioners and evaluators, this work formalizes a known but poorly characterized source of output variability, though the contribution is primarily conceptual with preliminary empirical support.
The paper introduces the concept of background temperature to quantify the nondeterminism in LLM outputs even at nominal temperature zero, caused by implementation-level factors. Pilot experiments across major LLM providers demonstrate the phenomenon and its implications for reproducibility.
Even when decoding with temperature $T=0$, large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of \emph{background temperature} $T_{\mathrm{bg}}$, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal $T=0$. We provide clean definitions, show how $T_{\mathrm{bg}}$ relates to a stochastic perturbation governed by the inference environment $I$, and propose an empirical protocol to estimate $T_{bg}$ via the equivalent temperature $T_n(I)$ of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.