Some Notes on the Sample Complexity of Approximate Channel Simulation
This addresses a bottleneck in machine learning-based data compression by enabling consistent encoding times, though it is incremental as it builds on prior work like Agustsson and Theis.
The paper tackles the problem of random runtime in channel simulation algorithms for lossy data compression by proposing approximate schemes with fixed runtime, showing that for some distributions, runtime scales super-polynomially, but with access to certain information, optimal coding can be achieved with sample complexity exp2((D_KL[Q || P] + o(1))/ε).
Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq ε$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ ε\big)$.