Learning with Succinct Common Representation Based on Wyner's Common Information
This work addresses the challenge of learning succinct common representations for multimodal data, offering incremental improvements in generative modeling and retrieval tasks.
The paper tackles the problem of generating conditional and joint samples by proposing a bimodal generative model based on Wyner's common information, achieving improved performance in tasks like zero-shot image retrieval with demonstrated experimental results on synthetic and real-world datasets.
A new bimodal generative model is proposed for generating conditional and joint samples, accompanied with a training method with learning a succinct bottleneck representation. The proposed model, dubbed as the variational Wyner model, is designed based on two classical problems in network information theory -- distributed simulation and channel synthesis -- in which Wyner's common information arises as the fundamental limit on the succinctness of the common representation. The model is trained by minimizing the symmetric Kullback--Leibler divergence between variational and model distributions with regularization terms for common information, reconstruction consistency, and latent space matching terms, which is carried out via an adversarial density ratio estimation technique. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with synthetic and real-world datasets, as well as a challenging zero-shot image retrieval task.