LGSep 4, 2025

Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data

arXiv:2509.04415v22 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses the challenge of distinguishing true causal heterogeneity from spurious associations in domains like biology and medicine, offering an interpretable method for mechanism-level discovery.

The paper tackled the problem of identifying causal heterogeneity in mixed observational data, proposing an unsupervised framework that jointly infers clusters and causal structures, achieving superior performance in clustering and structure learning tasks and recovering biologically meaningful mechanisms in real-world data.

Understanding causal heterogeneity is essential for scientific discovery in domains such as biology and medicine. However, existing methods lack causal awareness, with insufficient modeling of heterogeneity, confounding, and observational constraints, leading to poor interpretability and difficulty distinguishing true causal heterogeneity from spurious associations. We propose an unsupervised framework, HCL (Interpretable Causal Mechanism-Aware Clustering with Adaptive Heterogeneous Causal Structure Learning), that jointly infers latent clusters and their associated causal structures from mixed-type observational data without requiring temporal ordering, environment labels, interventions or other prior knowledge. HCL relaxes the homogeneity and sufficiency assumptions by introducing an equivalent representation that encodes both structural heterogeneity and confounding. It further develops a bi-directional iterative strategy to alternately refine causal clustering and structure learning, along with a self-supervised regularization that balance cross-cluster universality and specificity. Together, these components enable convergence toward interpretable, heterogeneous causal patterns. Theoretically, we show identifiability of heterogeneous causal structures under mild conditions. Empirically, HCL achieves superior performance in both clustering and structure learning tasks, and recovers biologically meaningful mechanisms in real-world single-cell perturbation data, demonstrating its utility for discovering interpretable, mechanism-level causal heterogeneity.

View on arXiv PDF

Similar