Causally-Aware Unsupervised Feature Selection Learning
This work addresses the problem of selecting relevant features in unlabeled high-dimensional data for researchers and practitioners in machine learning, offering an incremental improvement by incorporating causal awareness into unsupervised feature selection.
The paper tackles the problem of unsupervised feature selection by addressing the oversight of causal mechanisms in existing methods, which leads to irrelevant feature selection and poor interpretability; it proposes CAUSE-FS, a method that integrates causal regularization and hierarchical clustering to improve feature selection, achieving superior performance over state-of-the-art methods as demonstrated in experiments.
Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.