LG MLMay 15

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

arXiv:2605.1621931.4

AI Analysis

Provides a foundational theoretical understanding of the privacy cost in tail-risk learning, which is critical for practitioners applying differential privacy to risk-sensitive machine learning tasks.

The paper characterizes how differential privacy degrades the effective sample size for CVaR learning, showing that the privacy-relevant sample size is nτ rather than n, and derives tight excess risk rates for scalar estimation, finite classes, and convex Lipschitz learning under pure and approximate DP.

Differential privacy changes the effective sample size governing CVaR learning. For tail mass $τ$, the privacy-relevant sample size is not $n$, but $nτ$; equivalently, the effective private tail sample size is $εnτ$. Private CVaR excess risk decomposes into ordinary tail-risk statistical error and a privacy price. This decomposition is complete for scalar estimation and finite classes: scalar estimation has rate $Θ(B \min\{1,(nτ)^{-1/2}+(εnτ)^{-1}\})$, and finite classes of size $M$ have rate $Θ(B \min\{1,\sqrt{\log(2M)/(nτ)}+\log(2M)/(εnτ)\})$. These complete rates hold under pure DP, and their lower bounds extend to approximate DP in the stated small-$δ$ regimes. For convex Lipschitz learning, modular upper and lower reductions show that the CVaR-specific privacy term necessarily scales as $1/(εnτ)$, with dimension dependence inherited from private stochastic convex optimization. Together, these results identify ordinary private learning on $Θ(nτ)$ informative tail records as the canonical hard subproblem inside private CVaR learning.

View on arXiv PDF

Similar