Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
This work addresses the challenge of incorporating unobserved structural information into generative models for applications such as speech synthesis, though it appears incremental by building on existing variational Bayesian and VAE frameworks.
The authors tackled the problem of finding optimal paths in dynamic programming by introducing a probability-softening solution that transforms DP problems into directed acyclic graphs with Gibbs distributions, and they applied this to variational autoencoders to capture structured sparse optimal paths as latent variables for generative tasks like text-to-speech and singing voice synthesis, achieving end-to-end training.
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.