MLCRLGSep 11, 2024

Is merging worth it? Securely evaluating the information gain for causal dataset acquisition

arXiv:2409.07215v32 citationsh-index: 6Has Code
Originality Highly original
AI Analysis

This addresses the costly and privacy-sensitive problem of dataset acquisition for causal inference, offering a novel solution for institutions.

The paper tackles the challenge of securely evaluating the value of merging private datasets for causal estimation by introducing a cryptographically secure, information-theoretic approach using multi-party computation and differential privacy, demonstrating effectiveness on simulated and realistic benchmarks.

Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements whilst preserving more accurate computation compared to DP alone. To the best of our knowledge, this work presents the first privacy-preserving method for dataset acquisition tailored to causal estimation. We demonstrate the effectiveness and reliability of our method on a range of simulated and realistic benchmarks. Code is publicly available: https://github.com/LucileTerminassian/causal_prospective_merge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes