LGCRJul 23, 2025

Lower Bounds for Public-Private Learning under Distribution Shift

CMU
arXiv:2507.17895v1h-index: 14
Originality Synthesis-oriented
AI Analysis

This work provides theoretical lower bounds for differentially private machine learning with public data under distribution shift, which is incremental as it extends prior results to shifted distributions.

The paper tackles the problem of public-private learning under distribution shift, showing that when the shift is small, abundant data is needed to estimate private parameters, and when it is large, public data offers no benefit.

The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their parts. However, there are settings such as mean estimation where we have strong lower bounds, showing that when the two data sources have the same distribution, there is no complementary value to combining the two data sources. In this work we extend the known lower bounds for public-private learning to setting where the two data sources exhibit significant distribution shift. Our results apply to both Gaussian mean estimation where the two distributions have different means, and to Gaussian linear regression where the two distributions exhibit parameter shift. We find that when the shift is small (relative to the desired accuracy), either public or private data must be sufficiently abundant to estimate the private parameter. Conversely, when the shift is large, public data provides no benefit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes