LG AIFeb 12, 2024

Avoiding Catastrophe in Online Learning by Asking for Help

Benjamin Plaut, Hanlin Zhu, Stuart Russell

arXiv:2402.08062v69.25 citationsh-index: 5ICML

Originality Incremental advance

AI Analysis

This addresses safety-critical online learning for applications where irreversible errors must be avoided, though it is incremental as it builds on standard learnability assumptions.

The paper tackles the problem of online learning with catastrophic mistakes by proposing a model where an agent maximizes the product of payoffs to avoid catastrophe, using limited mentor queries. It shows that without queries, catastrophe is nearly guaranteed, but with a learnable mentor policy class, an algorithm achieves regret and query rates approaching zero as time grows.

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are "catastrophic", i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe in that round and try to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We also assume that the agent can transfer knowledge between similar inputs. We first show that in general, any algorithm either queries the mentor at a linear rate or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Although our focus is the product of payoffs, we provide matching bounds for the typical additive regret. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.

View on arXiv PDF

Similar