LGMay 18

Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing

arXiv:2605.1913557.9
Predicted impact top 40% in LG · last 90 daysOriginality Highly original
AI Analysis

Provides theoretical identifiability guarantees for multimodal causal representation learning, a challenging problem with implications for interpretable and robust representation learning.

This work establishes component-wise identifiability guarantees for causal latent representations in multimodal data with partially shared latent structures, without parametric assumptions. The proposed Wasserstein-based module outperforms SOTA methods on synthetic and realistic datasets.

Causal representation learning (CRL) seeks to uncover meaningful latent variables and their corresponding causal structure from high-dimensional observational data. Although its significance, CRL identifiability remains a crucial property, as it ensures the recovery of the mechanisms behind the data generation process, and hence the interpretability and robustness of the representation. Proving identifiability in CRL is intrinsically difficult, and we address in this work an even more challenging setting: multimodality. We consider multimodal observed data with a latent partially shared structure. Each modality is generated, through non linear mixing functions, from a specific subset of causal latent variables. Under flexible assumptions and without imposing any parametric distribution on the latent variables, we establish component-wise identifiability guarantees for the causal latent representation. Our identifiability results, furthermore, apply to the undercomplete scenario where we have, for each modality, more observed than latent variables. To instantiate our theoretical analysis, we introduce a Wasserstein-based module to recover the partially shared latent structure. Due to its differentiability, the latter can be easily integrated into all types of architecture, only requiring minimal changes. Extensive experiments on synthetic and realistic datasets validate the superiority of our approach over SOTA methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes