LGDCEMMLMay 21, 2023

Federated Offline Policy Learning

arXiv:2305.12407v2
Originality Incremental advance
AI Analysis

This addresses the challenge of training policies on distributed, heterogeneous data without sharing raw data, which is incremental as it builds on existing offline policy learning methods.

The paper tackles the problem of learning personalized decision policies from observational bandit feedback across multiple heterogeneous data sources in a federated setting, introducing a novel regret analysis with finite-sample upper bounds and a federated algorithm that shows tradeoffs in source participation.

We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data sources in offline policy learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes