LGJan 29

Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models

arXiv:2601.21851v13.82 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses the issue of shortcut learning in foundation models for AI practitioners, offering a scalable solution without relying on group labels or expensive optimization, though it builds on existing counterfactual and disentanglement methods.

The paper tackles the problem of foundation models being vulnerable to spurious correlations by proposing Visual Disentangled Diffusion Autoencoders (DiDAE) for efficient counterfactual generation, achieving state-of-the-art performance in mitigating shortcut learning and improving downstream tasks on unbalanced datasets.

Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies. Existing mitigation methods often rely on unavailable group labels or computationally expensive gradient-based adversarial optimization. To address these limitations, we propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning for efficient, gradient-free counterfactual generation directly for the foundation model. DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder. This allows the generation of multiple diverse, disentangled counterfactuals for each factual, much faster than existing baselines, which generate single entangled counterfactuals. When paired with Counterfactual Knowledge Distillation, DiDAE-CFKD achieves state-of-the-art performance in mitigating shortcut learning, improving downstream performance on unbalanced datasets.

View on arXiv PDF

Similar