An Additive Approximation Scheme for Generating Dyadic Codings for the Outputs of an LLM
For researchers in information theory and LLM security, this provides a principled framework for embedding hidden information with provable statistical undetectability, though the problem is niche and the approach is incremental.
The paper develops a polynomial-time additive approximation scheme for generating dyadic codings of discrete distributions (e.g., LLM next-token distributions) under rate constraints, minimizing total variation distance. The method provides provable near-optimal guarantees and is applied to LLM steganography, where rate corresponds to hidden bits per token and total variation bounds detectability.
We study the problem of approximating a discrete probability distribution, such as the next-token distribution of a large language model, by a dyadic distribution induced by a binary tree under encoding rate constraints. The objective is to partition the support of the distribution and assign dyadic probabilities to minimize total variation distance while achieving a prescribed rate. We formulate this task as a tree-based partitioning problem and develop a polynomial-time additive approximation scheme for the rate-constrained setting in the constant-rate regime. Our results provide provable guarantees for near-optimal dyadic approximations and, as an application, yield a principled framework for LLM-based steganography, where the rate maps to bits of hidden information embedded per token and the total variation bound controls statistical detectability.