Gergely Flamich

IT
h-index10
12papers
139citations
Novelty58%
AI Score54

12 Papers

LGSep 29, 2023Code
RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations

Jiajun He, Gergely Flamich, Zongyu Guo et al. · cambridge

COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while retaining a low computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates. Our PyTorch implementation is available at https://github.com/cambridge-mlg/RECOMBINER/.

LGJul 15, 2023
Minimal Random Code Learning with Mean-KL Parameterization

Jihao Andreas Lin, Gergely Flamich, José Miguel Hernández-Lobato · cambridge

This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.

MLOct 30, 2023
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo

Szilvia Ujváry, Gergely Flamich, Vincent Fortuin et al.

An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal Gibbs posterior using Hamiltonian Monte Carlo, (2) estimate its KL divergence from the prior with thermodynamic integration, and (3) propose three methods to obtain high-probability bounds under different assumptions. Our experiments on the MNIST dataset reveal significant tightness gaps, as much as 5-6\% in some cases.

ITMar 24
Data Compression with Relative Entropy Coding

Gergely Flamich

Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio recordings of animals) or users' audiovisual perception. This, in turn, has led to an explosion of theoretical investigations and insights that aim to develop new fundamental theories, methods and algorithms better suited for machine learning-based compressors. In this thesis, I contribute to this trend by investigating relative entropy coding, a mathematical framework that generalises classical source coding theory. Concretely, relative entropy coding deals with the efficient communication of uncertain or randomised information. One of its key advantages is that it extends compression methods to continuous spaces and can thus be integrated more seamlessly into modern machine learning pipelines than classical quantisation-based approaches. Furthermore, it is a natural foundation for developing advanced compression methods that are privacy-preserving or account for the perceptual quality of the reconstructed data. The thesis considers relative entropy coding at three conceptual levels: After introducing the basics of the framework, (1) I prove results that provide new, maximally tight fundamental limits to the communication and computational efficiency of relative entropy coding; (2) I use the theory of Poisson point processes to develop and analyse new relative entropy coding algorithms, whose performance attains the theoretic optima and (3) I showcase the strong practical performance of relative entropy coding by applying it to image, audio, video and protein data compression using small, energy-efficient, probabilistic neural networks called Bayesian implicit neural representations.

COMay 12
Multi-Marginal Couplings for Metropolis-Hastings

Buu Phan, Gergely Flamich, Ashish Khisti et al.

Convergence diagnosis for Markov chain Monte Carlo is a matter of fundamental importance in computational statistics: it determines the resources allocated to a particular sampling problem and influences the practitioner's view of the quality of estimates obtained from a Markov chain. Motivated by this, we contribute to the emerging class of coupling-based convergence diagnostic algorithms. Concretely, we study coupling multiple Metropolis-Hastings chains using multi-marginal coupling. We introduce a natural objective for this setting and establish lower and upper bounds by drawing connections to list-level distribution coupling and distributed pairwise-matching problems. This analysis ultimately leads to a shared-randomness Poisson Monte Carlo construction for coupling multiple Markov chains. In this process, we avoid a key dimension-dependent bottleneck in the runtime complexity of classical Poisson Monte Carlo by developing an adaptive rule for updating the point process, yielding significant gains in high-dimensional settings. Experiments on grand couplings of Markov chains show that our methods improve coalescence rates across dimensions, reducing meeting times by up to 50% compared with existing baselines.

ITApr 25
Rejection Sampling is Optimal for Relative Entropy Coding

Spencer Hill, Fady Alajaji, Tamás Linder et al.

In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. This demonstrates that rejection sampling is optimal for relative entropy coding. Our result implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

ITApr 7
Singular Relative Entropy Coding with Bits-Back Rejection Sampling

Gergely Flamich, Spencer Hill

A relative entropy code for a source $X \sim P_X$ is a stochastic code that encodes random samples from a prescribed $P_{Y \mid X}$ using as few bits as possible. A generalisation of entropy coding, it is a standard result that the minimum number of bits required to achieve this is at least the mutual information $I[X\,\Vert\,Y]$. However, a particularly fascinating feature of relative entropy coding compared to entropy coding is that, in general, this lower bound is only achievable to within an additional logarithmic factor. As such, an important research direction is to identify channels where we can reduce this gap. Sriramu and Wagner achieved such success by exhibiting a relative entropy code for so-called singular channels with sub-logarithmic asymptotic redundancy. However, their code is quite involved and, sadly, cannot be implemented in practice. In this paper, we construct the bits-back rejection sampler (BBRS), a relative entropy code that combines ideas from bits-back coding and (greedy) rejection sampling. Our analysis of BBRS reveals that the algorithm achieves the same asymptotic efficiency as Sriramu and Wagner's sampler, but with much simpler analysis and better constants. Moreover, BBRS can be implemented using standard relative entropy coding methods.

CLMar 31, 2025
You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation

Gergely Flamich, David Vilar, Jan-Thorsten Peter et al.

The goal of translation, be it by human or by machine, is, given some text in a source language, to produce text in a target language that simultaneously 1) preserves the meaning of the source text and 2) achieves natural expression in the target language. However, researchers in the machine translation community usually assess translations using a single score intended to capture semantic accuracy and the naturalness of the output simultaneously. In this paper, we build on recent advances in information theory to mathematically prove and empirically demonstrate that such single-score summaries do not and cannot give the complete picture of a system's true performance. Concretely, we prove that a tradeoff exists between accuracy and naturalness and demonstrate it by evaluating the submissions to the WMT24 shared task. Our findings help explain well-known empirical phenomena, such as the observation that optimizing translation systems for a specific accuracy metric (like BLEU) initially improves the system's naturalness, while ``overfitting'' the system to the metric can significantly degrade its naturalness. Thus, we advocate for a change in how translations are evaluated: rather than comparing systems using a single number, they should be compared on an accuracy-naturalness plane.

ITMay 7, 2024
Some Notes on the Sample Complexity of Approximate Channel Simulation

Gergely Flamich, Lennie Wells

Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq ε$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ ε\big)$.

ITMay 20, 2024
Accelerating Relative Entropy Coding with Space Partitioning

Jiajun He, Gergely Flamich, José Miguel Hernández-Lobato · cambridge

Relative entropy coding (REC) algorithms encode a random sample following a target distribution $Q$, using a coding distribution $P$ shared between the sender and receiver. Sadly, general REC algorithms suffer from prohibitive encoding times, at least on the order of $2^{D_{\text{KL}}[Q||P]}$, and faster algorithms are limited to very specific settings. This work addresses this issue by introducing a REC scheme utilizing space partitioning to reduce runtime in practical scenarios. We provide theoretical analyses of our method and demonstrate its effectiveness with both toy examples and practical applications. Notably, our method successfully handles REC tasks with $D_{\text{KL}}[Q||P]$ about three times greater than what previous methods can manage, and reduces the bitrate by approximately 5-15% in VAE-based lossless compression on MNIST and INR-based lossy compression on CIFAR-10, compared to previous methods, significantly improving the practicality of REC for neural compression.

LGMay 30, 2023
Compression with Bayesian Implicit Neural Representations

Zongyu Guo, Gergely Flamich, Jiajun He et al.

Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $β$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $β$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.

ITOct 2, 2020
Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding

Gergely Flamich, Marton Havasi, José Miguel Hernández-Lobato

Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the prior. However, due to the underlying algorithm, these methods can only be used for lossless compression, and they only achieve their nominal efficiency when compressing multiple images simultaneously; they are inefficient for compressing single images. As an alternative, we propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images, supported by our empirical results obtained on the Cifar10, ImageNet32 and Kodak datasets. Moreover, unlike previous bits-back methods, REC is immediately applicable to lossy compression, where it is competitive with the state-of-the-art on the Kodak dataset.