LGCVAug 10, 2023

Shadow Datasets, New challenging datasets for Causal Representation Learning

arXiv:2308.05707v24 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This provides new challenging benchmarks for researchers in causal representation learning, though it is incremental as it builds on existing datasets.

The paper tackles the problem of limited datasets for evaluating causal representation learning (CRL) by proposing two new datasets with more generative factors and complex causal graphs, and modifying existing real datasets to better align with causal graphs.

Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes