DBAILGPFSep 12, 2016

ZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data

arXiv:1609.03540v22 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in causal inference for researchers and practitioners dealing with big data, representing an incremental improvement by adapting existing methods to a database engine.

The paper tackles the problem of scaling causal inference from observational data to large datasets by introducing ZaliQL, a SQL-based framework that supports state-of-the-art methods and includes optimization techniques for speed improvements, with evaluation on real datasets showing significant performance gains.

Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suite supports the state-of-the-art methods for causal inference and run at scale within a database engine. In addition, we introduce several optimization techniques that significantly speedup causal inference, both in the online and offline setting. We evaluate the quality and performance of our techniques by experiments of real datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes