CLJun 7, 2021

Unsupervised Representation Disentanglement of Text: An Evaluation on Synthetic Datasets

arXiv:2106.03631v1713 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating disentanglement methods for text data, providing a framework and datasets for researchers, but it is incremental as it adapts existing approaches to a new domain.

The paper tackled the challenge of unsupervised representation disentanglement for text by evaluating image-domain models on synthetic datasets, finding that factors like representation sparsity and decoder coupling affect performance, with results showing gaps in text disentanglement metrics.

To highlight the challenges of achieving representation disentanglement for text domain in an unsupervised setting, in this paper we select a representative set of successfully applied models from the image domain. We evaluate these models on 6 disentanglement metrics, as well as on downstream classification tasks and homotopy. To facilitate the evaluation, we propose two synthetic datasets with known generative factors. Our experiments highlight the existing gap in the text domain and illustrate that certain elements such as representation sparsity (as an inductive bias), or representation coupling with the decoder could impact disentanglement. To the best of our knowledge, our work is the first attempt on the intersection of unsupervised representation disentanglement and text, and provides the experimental framework and datasets for examining future developments in this direction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes