LGITJan 15, 2021

Deciding What to Learn: A Rate-Distortion Approach

arXiv:2101.06197v330 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge for AI/ML researchers and practitioners in designing agents that efficiently learn under constraints, offering an automated method to balance learning effort and performance, though it is incremental as it builds on existing rate-distortion theory.

The paper tackles the problem of automating the trade-off between information acquisition and policy sub-optimality in sequential decision-making, by introducing a rate-distortion approach that allows agents to compute their own learning targets based on designer preferences, resulting in a general bound on expected discounted regret and improvements over Thompson sampling in identifying optimal policies.

Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature. In the face of a complex environment or constraints on time and resources, however, aiming to synthesize such an optimal policy can become infeasible. These scenarios give rise to an important trade-off between the information an agent must acquire to learn and the sub-optimality of the resulting policy. While an agent designer has a preference for how this trade-off is resolved, existing approaches further require that the designer translate these preferences into a fixed learning target for the agent. In this work, leveraging rate-distortion theory, we automate this process such that the designer need only express their preferences via a single hyperparameter and the agent is endowed with the ability to compute its own learning targets that best achieve the desired trade-off. We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes