MECRLGFeb 5, 2024

Estimation of conditional average treatment effects on distributed confidential data

arXiv:2402.02672v51 citationsh-index: 18Expert syst appl
Originality Incremental advance
AI Analysis

This addresses privacy concerns in distributed data analysis for fields like healthcare or social sciences, but it is incremental as it builds on existing double machine learning methods.

The paper tackles the problem of estimating conditional average treatment effects (CATEs) on distributed confidential data by proposing data collaboration double machine learning, which uses privacy-preserving fusion data and shows performance comparable to or better than existing methods in simulations.

The estimation of conditional average treatment effects (CATEs) is an important topic in many scientific fields. CATEs can be estimated with high accuracy if data distributed across multiple parties are centralized. However, it is difficult to aggregate such data owing to confidentiality or privacy concerns. To address this issue, we propose data collaboration double machine learning, a method for estimating CATE models using privacy-preserving fusion data constructed from distributed sources, and evaluate its performance through simulations. We make three main contributions. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data, providing robustness to model mis-specification compared to parametric approaches. Second, it enables collaborative estimation across different time points and parties by accumulating a knowledge base. Third, our method performs as well as or better than existing methods in simulations using synthetic, semi-synthetic, and real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes