PrismQuant: Rate-Distortion-Optimal Vector Quantization for Gaussian-Mixture Sources

Bumsu Park, Chanho Park, Youngmok Park, Namyoon Lee

arXiv:2605.155075.7

Predicted impact top 56% in IT · last 90 daysOriginality Highly original

AI Analysis

Provides a constructive rate-distortion theory and practical codec for multimodal sources, addressing a long-standing gap in transform coding for non-Gaussian data.

PrismQuant achieves rate-distortion-optimal vector quantization for Gaussian-mixture sources by proving that the conditional RD function is governed by a single global reverse-waterfilling level, and implements a practical codec that approaches the theoretical bound on synthetic data and outperforms transformer-based learned codecs on real-world CSI data with over 10x smaller model size.

For a Gaussian source under mean-squared error (MSE), classical transform coding is rate--distortion (RD) optimal: the Karhunen--Loeve transform (KLT) diagonalizes the covariance, reverse waterfilling allocates the bits, and scalar quantization closes the loop. This elegant story breaks down for multimodal sources, where no single covariance can capture heterogeneous local geometries, and the RD function loses its closed form. We revisit this problem through Gaussian-mixture sources and develop a constructive RD theory for them. Our key finding is that the mixture structure incurs only a component label cost. Conditioned on the active mixture component, each branch is Gaussian; the challenge is allocating bits across heterogeneous branches. We prove that the genie-aided conditional RD function is governed by a single global reverse-waterfilling level shared across all components and eigenmodes. Building on this result, we introduce PrismQuant, which transmits the component label losslessly and encodes the residual using the component-matched KLT, followed by scalar quantization, achieving a rate of H(C)/n bits per source dimension of the converse, with a vanishing asymptotic gap. We further develop a practical implementation based on EM-driven Gaussian-mixture learning, component-adaptive KLTs, and entropy-constrained scalar quantization (ECSQ). Experiments on synthetic Gaussian mixtures show that PrismQuant closely approaches the theoretical RD bound, while experiments on real-world channel-state-information (CSI) data demonstrate competitive or superior performance compared with transformer-based learned codecs at more than one order of magnitude smaller model size.

View on arXiv PDF

Similar