LGAIMLJun 9, 2020

Stable Prediction via Leveraging Seed Variable

arXiv:2006.05076v1
AI Analysis

This addresses the issue of prediction instability in machine learning when test data distributions differ from training, which is crucial for real-world applications, though it builds incrementally on existing causal inference approaches.

The paper tackles the problem of stable prediction across unknown test data with different distributions by separating causal variables from spurious correlations using a seed variable, and demonstrates that their algorithm outperforms state-of-the-art methods in experiments.

In this paper, we focus on the problem of stable prediction across unknown test data, where the test distribution is agnostic and might be totally different from the training one. In such a case, previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction. Those spurious correlations are changeable across data, leading to instability of prediction across data. By assuming the relationships between causal variables and response variable are invariant across data, to address this problem, we propose a conditional independence test based algorithm to separate those causal variables with a seed variable as priori, and adopt them for stable prediction. By assuming the independence between causal and non-causal variables, we show, both theoretically and with empirical experiments, that our algorithm can precisely separate causal and non-causal variables for stable prediction across test data. Extensive experiments on both synthetic and real-world datasets demonstrate that our algorithm outperforms state-of-the-art methods for stable prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes