ML LGJun 1, 2019

BreGMN: scaled-Bregman Generative Modeling Networks

Akash Srivastava, Kristjan Greenewald, Farzaneh Mirzazadeh

arXiv:1906.00313v14.96 citations

Originality Highly original

AI Analysis

This addresses a fundamental issue in generative modeling for machine learning researchers, offering a novel approach to handle non-overlapping distributions without changing the objective function.

The paper tackles the support mismatch problem in generative modeling by proposing a method that augments the base measure of divergences, using Scaled Bregman Divergences to generalize f-divergences and Bregman divergences. It demonstrates promising results on MNIST, CelebA, and CIFAR-10 datasets.

The family of f-divergences is ubiquitously applied to generative modeling in order to adapt the distribution of the model to that of the data. Well-definedness of f-divergences, however, requires the distributions of the data and model to overlap completely in every time step of training. As a result, as soon as the support of distributions of data and model contain non-overlapping portions, gradient based training of the corresponding model becomes hopeless. Recent advances in generative modeling are full of remedies for handling this support mismatch problem: key ideas include either modifying the objective function to integral probability measures (IPMs) that are well-behaved even on disjoint probabilities, or optimizing a well-behaved variational lower bound instead of the true objective. We, on the other hand, establish that a complete change of the objective function is unnecessary, and instead an augmentation of the base measure of the problematic divergence can resolve the issue. Based on this observation, we propose a generative model which leverages the class of Scaled Bregman Divergences and generalizes both f-divergences and Bregman divergences. We analyze this class of divergences and show that with the appropriate choice of base measure it can resolve the support mismatch problem and incorporate geometric information. Finally, we study the performance of the proposed method and demonstrate promising results on MNIST, CelebA and CIFAR-10 datasets.

View on arXiv PDF

Similar