LG AIAug 10, 2021

Regularized Sequential Latent Variable Models with Adversarial Neural Networks

arXiv:2108.04496v13.13 citationsh-index: 53

Originality Incremental advance

AI Analysis

This work addresses the problem of modeling variability in sequential data like speech and handwriting for researchers in machine learning, though it appears incremental as it builds on existing VAE and adversarial methods.

The paper tackled the limited randomness in standard RNNs for modeling sequential data by incorporating high-level latent random variables and adversarial training under the VAE principle, resulting in theoretical optimum training stability and improved posterior approximation, with numerical results showing convergence of reconstruction loss and evidence lower bound to the same level and adversarial training loss to 0 on TIMIT speech data.

The recurrent neural networks (RNN) with richly distributed internal states and flexible non-linear transition functions, have overtaken the dynamic Bayesian networks such as the hidden Markov models (HMMs) in the task of modeling highly structured sequential data. These data, such as from speech and handwriting, often contain complex relationships between the underlaying variational factors and the observed data. The standard RNN model has very limited randomness or variability in its structure, coming from the output conditional probability model. This paper will present different ways of using high level latent random variables in RNN to model the variability in the sequential data, and the training method of such RNN model under the VAE (Variational Autoencoder) principle. We will explore possible ways of using adversarial method to train a variational RNN model. Contrary to competing approaches, our approach has theoretical optimum in the model training and provides better model training stability. Our approach also improves the posterior approximation in the variational inference network by a separated adversarial training step. Numerical results simulated from TIMIT speech data show that reconstruction loss and evidence lower bound converge to the same level and adversarial training loss converges to 0.

View on arXiv PDF

Similar