AIJun 16, 2022

How to talk so AI will learn: Instructions, descriptions, and autonomy

arXiv:2206.07870v329 citationsh-index: 99Has Code
Originality Highly original
AI Analysis

This addresses the problem of value alignment in AI by enabling agents to learn preferences from language, facilitating a shift from obedience to learning.

The paper formalizes learning from human language in a contextual bandit setting, showing that instructions are optimal for low-autonomy agents while descriptions are better for independent agents, and validates this with a behavioral experiment where the pragmatic listener recovers human reward functions.

From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviors. We study two distinct types of language: $\textit{instructions}$, which provide information about the desired policy, and $\textit{descriptions}$, which provide information about the reward function. We show that the agent's degree of autonomy determines which form of language is optimal: instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently. We then define a pragmatic listener agent that robustly infers the speaker's reward function by reasoning about $\textit{how}$ the speaker expresses themselves. We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans' reward functions. Finally, we show that this form of social learning can integrate with and reduce regret in traditional reinforcement learning. We hope these insights facilitate a shift from developing agents that $\textit{obey}$ language to agents that $\textit{learn}$ from it.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes