Neri Merhav

h-index46

3papers

8,574citations

3 Papers

7.4STAT-MECHApr 1

Statistical Physics of Coding for the Integers

Neri Merhav

We study a paradigm of coding for compression of the natural numbers via the zeta distribution and develop a statistical-mechanical interpretation, both in terms of Hagedorn systems and a Bose gas with energy levels given by logarithms of prime numbers. We also propose a simple coding scheme for the zeta distribution that nearly achieves the ideal code length. For block coding of vectors of natural numbers, we derive the micro-canonical entropy function and demonstrate its asymptotic linearity implying that its behavior is analogous to that of a Hagedorn system. We also derive the large deviations rate function, and provide a formula for the best coding parameter in the large deviations sense. We show that due the Hagedorn-type phase transition there is only partial equivalence of ensembles, due to the degeneration of the domain of the partition function.

1.7ITJun 25

Grouped Reverse Importance Sampling for the Partition Function

Neri Merhav

We introduce and analyze several grouped variants of the method of reverse importance sampling (RIS) for estimating a partition function from samples of the Boltzmann distribution $p(x)=e^{ \betaU(x)}/Z(β)$. Ordinary RIS weighs each sample separately. By contrast, our proposed grouped RIS (GRIS) methods are based on assigning the samples into groups (or batches) of size $k\ge 2$ and applying a joint weight function to each group. The focal point of the research is the quest for a tractable weight function that would yield the smallest possible mean squared error (MSE). A simple identity relates the normalized MSE to the chi-squared divergence between the joint-weight distribution and the distribution of the $k$-fold sum of independent energies. Our first theoretical finding is that any weight that improves on ordinary RIS ($k=1$) must couple the group components. In other words, it must not be a product-form function across those components, as product-form weight functions always worsen the MSE. Our second, and more important, finding is that, without loss of optimality, it is sufficient to seek weight functions that depend only on the total energy, $\sum_iU(x_i)$, of the group (group-energy weight functions); for the sliding-window variants, the analogous result is open. This finding simplifies both the theoretical analysis and the application of the method substantially. For $k=2$ and $k=3$, the MSE associated with non-overlapping (NOL) groups is reduced by $20$--$65\%$ across three examples. We then propose two additional variants of GRIS, both based on sliding-window grouping (as opposed to NOL grouping). The first applies a fixed weight sliding window (FSW) across all (cyclic) shifts of the sliding window, and the second allows a variable-weight sliding window (VSW). The FSW scheme improves on the NOL one, and the VSW improves even further, as will be demonstrated numerically.

1.2ITDec 27, 2021

Universal Randomized Guessing Subjected to Distortion

Asaf Cohen, Neri Merhav

In this paper, we consider the problem of guessing a sequence subject to a distortion constraint. Specifically, we assume the following game between Alice and Bob: Alice has a sequence $\bx$ of length $n$. Bob wishes to guess $\bx$, yet he is satisfied with finding any sequence $\hat{\bx}$ which is within a given distortion $D$ from $\bx$. Thus, he successively submits queries to Alice, until receiving an affirmative answer, stating that his guess was within the required distortion. Finding guessing strategies which minimize the number of guesses (the \emph{guesswork}), and analyzing its properties (e.g., its $ρ$--th moment) has several applications in information security, source and channel coding. Guessing subject to a distortion constraint is especially useful when considering contemporary biometrically--secured systems, where the "password" which protects the data is not a single, fixed vector but rather a \emph{ball of feature vectors} centered at some $\bx$, and any feature vector within the ball results in acceptance. We formally define the guessing problem under distortion in \emph{four different setups}: memoryless sources, guessing through a noisy channel, sources with memory and individual sequences. We suggest a randomized guessing strategy which is asymptotically optimal for all setups and is \emph{five--fold universal}, as it is independent of the source statistics, the channel, the moment to be optimized, the distortion measure and the distortion level.