LG MLAug 5, 2025

DP-NCB: Privacy Preserving Fair Bandits

Dhruv Sarkar, Nishant Pandey, Sayak Ray Chowdhury

arXiv:2508.03836v12 citationsh-index: 13

Originality Highly original

AI Analysis

This addresses the need for privacy-preserving and fair decision-making in socially sensitive applications like clinical trials, offering a unified solution where prior work handled these objectives separately.

The paper tackles the problem of simultaneously ensuring privacy and fairness in multi-armed bandit algorithms, introducing DP-NCB, which achieves order-optimal Nash regret while providing differential privacy, with simulations showing substantially lower regret than baselines.

Multi-armed bandit algorithms are fundamental tools for sequential decision-making under uncertainty, with widespread applications across domains such as clinical trials and personalized decision-making. As bandit algorithms are increasingly deployed in these socially sensitive settings, it becomes critical to protect user data privacy and ensure fair treatment across decision rounds. While prior work has independently addressed privacy and fairness in bandit settings, the question of whether both objectives can be achieved simultaneously has remained largely open. Existing privacy-preserving bandit algorithms typically optimize average regret, a utilitarian measure, whereas fairness-aware approaches focus on minimizing Nash regret, which penalizes inequitable reward distributions, but often disregard privacy concerns. To bridge this gap, we introduce Differentially Private Nash Confidence Bound (DP-NCB)-a novel and unified algorithmic framework that simultaneously ensures $ε$-differential privacy and achieves order-optimal Nash regret, matching known lower bounds up to logarithmic factors. The framework is sufficiently general to operate under both global and local differential privacy models, and is anytime, requiring no prior knowledge of the time horizon. We support our theoretical guarantees with simulations on synthetic bandit instances, showing that DP-NCB incurs substantially lower Nash regret than state-of-the-art baselines. Our results offer a principled foundation for designing bandit algorithms that are both privacy-preserving and fair, making them suitable for high-stakes, socially impactful applications.

View on arXiv PDF

Similar