Learning to Embed Distributions via Maximum Kernel Entropy
This work addresses a key bottleneck in kernel methods for distribution classification, offering a novel approach that could enhance tasks in machine learning where data is represented as distributions.
The paper tackles the challenge of selecting suitable kernels for distribution regression by proposing an unsupervised method to learn data-dependent distribution kernels via maximum kernel entropy, demonstrating improved performance across different modalities.
Empirical data can often be considered as samples from a set of probability distributions. Kernel methods have emerged as a natural approach for learning to classify these distributions. Although numerous kernels between distributions have been proposed, applying kernel methods to distribution regression tasks remains challenging, primarily because selecting a suitable kernel is not straightforward. Surprisingly, the question of learning a data-dependent distribution kernel has received little attention. In this paper, we propose a novel objective for the unsupervised learning of data-dependent distribution kernel, based on the principle of entropy maximization in the space of probability measure embeddings. We examine the theoretical properties of the latent embedding space induced by our objective, demonstrating that its geometric structure is well-suited for solving downstream discriminative tasks. Finally, we demonstrate the performance of the learned kernel across different modalities.