LGDATA-ANMLOct 19, 2020

Causal Discovery using Compression-Complexity Measures

arXiv:2010.09336v317 citations
Originality Incremental advance
AI Analysis

This addresses causal inference in domains like genomics, offering a method for sequence pairs without temporal structures, though it appears incremental as it builds on existing compression techniques.

The paper tackles the problem of inferring causal direction from discrete symbolic sequences by using compression-complexity measures to infer context-free grammars and compare compression efficiency, achieving competitive performance with state-of-the-art methods on synthetic and real-world benchmarks. It demonstrates applications in analyzing SARS-CoV-2 genome sequences to capture directed causal information exchange.

Causal inference is one of the most fundamental problems across all domains of science. We address the problem of inferring a causal direction from two observed discrete symbolic sequences $X$ and $Y$. We present a framework which relies on lossless compressors for inferring context-free grammars (CFGs) from sequence pairs and quantifies the extent to which the grammar inferred from one sequence compresses the other sequence. We infer $X$ causes $Y$ if the grammar inferred from $X$ better compresses $Y$ than in the other direction. To put this notion to practice, we propose three models that use the Compression-Complexity Measures (CCMs) - Lempel-Ziv (LZ) complexity and Effort-To-Compress (ETC) to infer CFGs and discover causal directions without demanding temporal structures. We evaluate these models on synthetic and real-world benchmarks and empirically observe performances competitive with current state-of-the-art methods. Lastly, we present two unique applications of the proposed models for causal inference directly from pairs of genome sequences belonging to the SARS-CoV-2 virus. Using a large number of sequences, we show that our models capture directed causal information exchange between sequence pairs, presenting novel opportunities for addressing key issues such as contact-tracing, motif discovery, evolution of virulence and pathogenicity in future applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes