A Recurrent Latent Variable Model for Sequential Data
This work addresses the problem of capturing variability in sequential data like speech for researchers in machine learning, but it is incremental as it builds on existing variational autoencoder and RNN methods.
The authors tackled modeling variability in structured sequential data by incorporating latent random variables into RNN hidden states, resulting in a variational RNN (VRNN) that showed improved performance on speech and handwriting datasets.
In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state.