AIAug 28, 2015

Mining Combined Causes in Large Data Sets

arXiv:1508.07092v218 citations
AI Analysis

This addresses a practical causal discovery problem for researchers and practitioners in fields like data science and statistics, offering an incremental improvement by focusing on a specific bottleneck in existing methods.

The paper tackles the problem of detecting combined causes (multi-factor causes where individual variables are not causes) in large observational data sets, proposing a novel approach that achieves high-quality causal discoveries with high computational efficiency in experiments on synthetic and real-world data.

In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes