CLAILGDec 15, 2023

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation

arXiv:2312.11532v29 citationsh-index: 1Has CodeAAAI
Originality Incremental advance
AI Analysis

This addresses topic modeling and document generation for NLP researchers, offering a novel method but with incremental improvements over existing VQ-VAE approaches.

The paper tackles topic modeling by using latent codebooks from VQ-VAE to capture pre-trained embedding information, proposing TVQ-VAE for topic-guided document generation, with experiments showing it effectively captures topic context and supports flexible generation.

This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates the original documents related to the respective latent codebook. The TVQ-VAE can visualize the topics with various generative distributions including the traditional BoW distribution and the autoregressive image generation. Our experimental results on document analysis and image generation demonstrate that TVQ-VAE effectively captures the topic context which reveals the underlying structures of the dataset and supports flexible forms of document generation. Official implementation of the proposed TVQ-VAE is available at https://github.com/clovaai/TVQ-VAE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes