ML LG STDec 28, 2021

Multitask Learning and Bandits via Robust Statistics

arXiv:2112.14233v59.410 citations

Originality Incremental advance

AI Analysis

This work addresses sample efficiency in multitask learning and bandits for applications like retail and healthcare, though it is incremental as it builds on existing robust statistics and LASSO methods.

The paper tackles the problem of learning across many related but heterogeneous tasks by proposing a two-stage multitask estimator that combines robust statistics and LASSO regression, resulting in improved sample complexity bounds, including exponential gains for data-poor instances, and enhanced regret bounds in contextual bandits.

Decision-makers often simultaneously face many related but heterogeneous learning problems. For instance, a large retailer may wish to learn product demand at different stores to solve pricing or inventory problems, making it desirable to learn jointly for stores serving similar customers; alternatively, a hospital network may wish to learn patient risk at different providers to allocate personalized interventions, making it desirable to learn jointly for hospitals serving similar patient populations. Motivated by real datasets, we study a natural setting where the unknown parameter in each learning instance can be decomposed into a shared global parameter plus a sparse instance-specific term. We propose a novel two-stage multitask learning estimator that exploits this structure in a sample-efficient way, using a unique combination of robust statistics (to learn across similar instances) and LASSO regression (to debias the results). Our estimator yields improved sample complexity bounds in the feature dimension $d$ relative to commonly-employed estimators; this improvement is exponential for "data-poor" instances, which benefit the most from multitask learning. We illustrate the utility of these results for online learning by embedding our multitask estimator within simultaneous contextual bandit algorithms. We specify a dynamic calibration of our estimator to appropriately balance the bias-variance tradeoff over time, improving the resulting regret bounds in the context dimension $d$. Finally, we illustrate the value of our approach on synthetic and real datasets.

View on arXiv PDF

Similar