LGAICVAug 10, 2024

Sequential Representation Learning via Static-Dynamic Conditional Disentanglement

arXiv:2408.05599v14 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses the problem of learning disentangled representations from videos for researchers in machine learning and computer vision, offering a novel theoretical framework and constraint, though it appears incremental as it builds on existing disentanglement methods.

The paper tackles self-supervised disentangled representation learning for sequential data by separating time-independent and time-varying factors in videos, proposing a model that accounts for causal relationships and uses Normalizing Flows to improve expressivity, with experiments showing it outperforms previous state-of-the-art techniques in scenarios where scene dynamics are influenced by content.

This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables and that improves the model expressivity through additional Normalizing Flows. A formal definition of the factors is proposed. This formalism leads to the derivation of sufficient conditions for the ground truth factors to be identifiable, and to the introduction of a novel theoretically grounded disentanglement constraint that can be directly and efficiently incorporated into our new framework. The experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes