LGMLOct 29, 2018

Variational Inference with Tail-adaptive f-Divergence

arXiv:1810.11943v361 citations
Originality Highly original
AI Analysis

This addresses a bottleneck in probabilistic machine learning for researchers and practitioners using variational inference, offering a more stable alternative to existing divergence methods.

The paper tackles the problem of large or infinite variance in importance sampling for variational inference with α-divergences by proposing tail-adaptive f-divergences that adapt to weight tails, ensuring finite moments and mass-covering properties. Results show significant advantages over KL and α-divergence methods in Bayesian neural networks and deep reinforcement learning.

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes