A Statistical-Physics Refinement of Soft Covering
For information theory researchers, this provides a refined statistical-physics understanding of random code output distributions, though the results are theoretical and incremental.
The paper derives a single-letter formula for the annealed free energy of a random code's output distribution, revealing a competition between bulk and sparse codeword populations, and characterizes the phase diagram with explicit boundaries. Applications to guesswork, channel resolvability, and hypothesis testing are discussed.
We study the channel output distribution induced by a rate-$R$ random code via statistical physics. The partition function is $Z_n(β|\mathcal{C}) = \sum_{y^n}[P_{Y^n|\mathcal{C}}(y^n)]^β$, where $\mathcal{C}$ is the code and $β> 0$ is inverse temperature. Our focus is on the free energy which is the normalized logarithm of this quantity, which encodes the full Rényi spectrum of the output distribution. The single-letter formula derived for the annealed free energy decomposes into two branches which reflect a ``competition'' between two populations of codewords. One is the \emph{bulk branch}, $ψ_{\mbox{\tiny b}}(β,R)$, which is driven by typical codewords and the other one is the \emph{sparse branch} $ψ_{\mbox{\tiny s}}(β,R)$, which is driven by a-typical codewords, where the qualifiers `typical' and `atypical' are in a sense to become apparent later. We analyze the phase structure of each branch separately and characterize their competition. Both branches are derived for all $β> 0$. The phase boundary $R^\star(β)$, where the two branches are equal, is analyzed for $β\geq 1$, where it has an explicit closed-form expression. The phase diagram in the first quadrant of the $(β, R)$ plane has four regions separated by three boundaries: $R = I^{\mbox{\tiny b}}(β)$ (bulk branch transition), $R = R^\star(β)$ (bulk--sparse competition boundary), and $R = I^{\mbox{\tiny s}}(β)$ (sparse branch transition), all meeting at the point $(β, R) = (1, I(X;Y))$, where $I(X;Y)$ is the mutual information induced by the input type and the channel. Applications to guesswork, channel resolvability, and hypothesis testing are discussed, and all results are illustrated with a numerical example of a Z-channel.