LGAICVMLNov 15, 2023

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

ETH Zurich
arXiv:2311.08815v217 citationsh-index: 169
Originality Incremental advance
AI Analysis

This addresses the issue of performance degradation in downstream tasks due to task-specific tuning in self-supervised learning, which is incremental as it builds on prior work on disentanglement.

The paper tackles the problem of self-supervised representation learning by proposing a method to disentangle style features instead of discarding them, using multiple style embedding spaces and maximizing joint entropy, and demonstrates benefits on synthetic and real-world data.

Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To deal with this, current approaches try to retain some style information by tuning the degree of invariance to some particular task, such as ImageNet object classification. However, prior work has shown that such task-specific tuning can lead to significant performance degradation on other tasks that rely on the discarded style. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and individual style variables. We empirically demonstrate the benefits of our approach on both synthetic and real-world data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes