Distributed, communication-efficient, and differentially private estimation of KL divergence
This addresses privacy and communication challenges in federated learning and analytics, though it is incremental as it builds on existing methods for private estimation.
The paper tackles the problem of estimating KL divergence across distributed, sensitive data while ensuring differential privacy and communication efficiency, achieving accuracy comparable to non-private baselines.
A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.