2.2ITMay 19
Soft Covering Through the Lens of Hypothesis TestingNeri Merhav
We study the soft covering phenomenon through the lens of Neyman--Pearson hypothesis testing: given a channel output sequence $y^n$, can one decide whether it was produced when the channel was driven by a random codeword, or generated independently from the output marginal? We derive exact exponential decay rates for the jointly averaged false-alarm (FA) probability $α_n(τ,R)$ and missed-detection (MD) probability $β_n(τ,R)$, as functions of the decision threshold $τ$ and the codebook rate $R$. The derived single-letter formulas of the exponents $\EFA(τ,R)=-\lim_{n\to\infty}\frac{1}{n}\lnα_n(τ,R)$ and $\EMD(τ,R)=-\lim_{n\to\infty}\frac{1}{n}\lnβ_n(τ,R)$ are tight in the random coding sense. The analysis reveals a rich phase structure. For $R < I(X;Y)$, there is a genuine exponential tradeoff between the two error types over the interval $τ\in (0, I(X;Y)-R)$. At $R = I(X;Y)$, this tradeoff interval collapses to the single point $τ= 0$, where both error exponents simultaneously vanish, a fact which manifests the soft covering phenomenon in the Neyman--Pearson sense. For $R > I(X;Y)$, the same instantaneous collapse persists at $τ= 0$; moreover, for every $τ$ at least one exponent is zero: the FA exponent is zero for $τ\le 0$ (FA probability does not decay exponentially), and the MD exponent is zero for $τ\ge 0$ (and finite, channel-specific for $τ<0$; see Remark~\ref{rem:jump}). There is no interval of $τ$ where both exponents are simultaneously positive. A sharp phase transition in the MD exponent occurs at $τ^* = [I(X;Y)-R]_+$ for all rates.
0.5ITMay 3
A Statistical-Physics Refinement of Soft CoveringNeri Merhav
We study the channel output distribution induced by a rate-$R$ random code via statistical physics. The partition function is $Z_n(β|\mathcal{C}) = \sum_{y^n}[P_{Y^n|\mathcal{C}}(y^n)]^β$, where $\mathcal{C}$ is the code and $β> 0$ is inverse temperature. Our focus is on the free energy which is the normalized logarithm of this quantity, which encodes the full Rényi spectrum of the output distribution. The single-letter formula derived for the annealed free energy decomposes into two branches which reflect a ``competition'' between two populations of codewords. One is the \emph{bulk branch}, $ψ_{\mbox{\tiny b}}(β,R)$, which is driven by typical codewords and the other one is the \emph{sparse branch} $ψ_{\mbox{\tiny s}}(β,R)$, which is driven by a-typical codewords, where the qualifiers `typical' and `atypical' are in a sense to become apparent later. We analyze the phase structure of each branch separately and characterize their competition. Both branches are derived for all $β> 0$. The phase boundary $R^\star(β)$, where the two branches are equal, is analyzed for $β\geq 1$, where it has an explicit closed-form expression. The phase diagram in the first quadrant of the $(β, R)$ plane has four regions separated by three boundaries: $R = I^{\mbox{\tiny b}}(β)$ (bulk branch transition), $R = R^\star(β)$ (bulk--sparse competition boundary), and $R = I^{\mbox{\tiny s}}(β)$ (sparse branch transition), all meeting at the point $(β, R) = (1, I(X;Y))$, where $I(X;Y)$ is the mutual information induced by the input type and the channel. Applications to guesswork, channel resolvability, and hypothesis testing are discussed, and all results are illustrated with a numerical example of a Z-channel.
32.3STAT-MECHApr 1
Statistical Physics of Coding for the IntegersNeri Merhav
We study a paradigm of coding for compression of the natural numbers via the zeta distribution and develop a statistical-mechanical interpretation, both in terms of Hagedorn systems and a Bose gas with energy levels given by logarithms of prime numbers. We also propose a simple coding scheme for the zeta distribution that nearly achieves the ideal code length. For block coding of vectors of natural numbers, we derive the micro-canonical entropy function and demonstrate its asymptotic linearity implying that its behavior is analogous to that of a Hagedorn system. We also derive the large deviations rate function, and provide a formula for the best coding parameter in the large deviations sense. We show that due the Hagedorn-type phase transition there is only partial equivalence of ensembles, due to the degeneration of the domain of the partition function.
ITDec 27, 2021
Universal Randomized Guessing Subjected to DistortionAsaf Cohen, Neri Merhav
In this paper, we consider the problem of guessing a sequence subject to a distortion constraint. Specifically, we assume the following game between Alice and Bob: Alice has a sequence $\bx$ of length $n$. Bob wishes to guess $\bx$, yet he is satisfied with finding any sequence $\hat{\bx}$ which is within a given distortion $D$ from $\bx$. Thus, he successively submits queries to Alice, until receiving an affirmative answer, stating that his guess was within the required distortion. Finding guessing strategies which minimize the number of guesses (the \emph{guesswork}), and analyzing its properties (e.g., its $ρ$--th moment) has several applications in information security, source and channel coding. Guessing subject to a distortion constraint is especially useful when considering contemporary biometrically--secured systems, where the "password" which protects the data is not a single, fixed vector but rather a \emph{ball of feature vectors} centered at some $\bx$, and any feature vector within the ball results in acceptance. We formally define the guessing problem under distortion in \emph{four different setups}: memoryless sources, guessing through a noisy channel, sources with memory and individual sequences. We suggest a randomized guessing strategy which is asymptotically optimal for all setups and is \emph{five--fold universal}, as it is independent of the source statistics, the channel, the moment to be optimized, the distortion measure and the distortion level.