LGSep 3, 2024

Federated Prediction-Powered Inference from Decentralized Data

arXiv:2409.01730v14 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of data silos in domains where private data cannot be shared, enabling decentralized inference while maintaining statistical validity.

The paper tackles the problem of performing statistically valid inference when gold-standard data is decentralized and private, by introducing the Federated Prediction-Powered Inference (Fed-PPI) framework, which combines federated learning with prediction-powered inference to produce valid confidence intervals without sharing private data.

In various domains, the increasing application of machine learning allows researchers to access inexpensive predictive data, which can be utilized as auxiliary data for statistical inference. Although such data are often unreliable compared to gold-standard datasets, Prediction-Powered Inference (PPI) has been proposed to ensure statistical validity despite the unreliability. However, the challenge of `data silos' arises when the private gold-standard datasets are non-shareable for model training, leading to less accurate predictive models and invalid inferences. In this paper, we introduces the Federated Prediction-Powered Inference (Fed-PPI) framework, which addresses this challenge by enabling decentralized experimental data to contribute to statistically valid conclusions without sharing private information. The Fed-PPI framework involves training local models on private data, aggregating them through Federated Learning (FL), and deriving confidence intervals using PPI computation. The proposed framework is evaluated through experiments, demonstrating its effectiveness in producing valid confidence intervals.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes