LGJun 26, 2024

Sequential Disentanglement by Extracting Static Information From A Single Sequence Element

arXiv:2406.18131v19 citations
Originality Highly original
AI Analysis

This addresses the challenge of learning disentangled representations in sequential data for applications like video and audio processing, though it is an incremental improvement over existing methods.

The paper tackled the problem of information leakage in unsupervised sequential disentanglement by proposing a novel architecture that conditions on a single sequence element, achieving state-of-the-art results on generation and prediction tasks across multiple data modalities.

One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the dynamic vectors encode both static and dynamic information, or vice versa, leading to a non-disentangled representation. Attempts to alleviate this problem via reducing the dynamic dimension and auxiliary loss terms gain only partial success. Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. Remarkably, the resulting variational framework is simpler in terms of required loss terms, hyperparameters, and data augmentation. We evaluate our method on multiple data-modality benchmarks including general time series, video, and audio, and we show beyond state-of-the-art results on generation and prediction tasks in comparison to several strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes