EMPEROR: Efficient Moment-Preserving Representation of Distributions
This provides a mathematically rigorous and efficient alternative to heuristic pooling methods for capturing distributional information in neural networks, addressing a domain-specific bottleneck in representation learning.
The paper tackles the problem of representing high-dimensional probability measures in neural networks by introducing EMPEROR, a framework that encodes feature distributions through statistical moments, resulting in a compact descriptor with determinacy guarantees and optimal error bounds.
We introduce EMPEROR (Efficient Moment-Preserving Representation of Distributions), a mathematically rigorous and computationally efficient framework for representing high-dimensional probability measures arising in neural network representations. Unlike heuristic global pooling operations, EMPEROR encodes a feature distribution through its statistical moments. Our approach leverages the theory of sliced moments: features are projected onto multiple directions, lightweight univariate Gaussian mixture models (GMMs) are fit to each projection, and the resulting slice parameters are aggregated into a compact descriptor. We establish determinacy guarantees via Carleman's condition and the Cramér-Wold theorem, ensuring that the GMM is uniquely determined by its sliced moments, and we derive finite-sample error bounds that scale optimally with the number of slices and samples. Empirically, EMPEROR captures richer distributional information than common pooling schemes across various data modalities, while remaining computationally efficient and broadly applicable.