Discrete Autoencoders for Sequence Models
This work addresses the problem of poor representation extraction in sequence models for tasks like language modeling and machine translation, offering a novel method that is incremental in augmenting existing approaches.
The authors tackled the challenge of extracting meaningful hierarchical representations from sequence models by introducing an autoencoder with a discrete latent space, using an improved semantic hasashing technique to enable gradient propagation, and demonstrated its effectiveness through quantitative efficiency measures and applications like generating diverse translations.
Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models. We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space. In order to propagate gradients though this discrete representation we introduce an improved semantic hashing technique. We show that this technique performs well on a newly proposed quantitative efficiency measure. We also analyze latent codes produced by the model showing how they correspond to words and phrases. Finally, we present an application of the autoencoder-augmented model to generating diverse translations.