LGCLJun 12, 2025

Provably Learning from Language Feedback

arXiv:2506.10341v19 citationsh-index: 24
Originality Highly original
AI Analysis

This work addresses the challenge of principled interactive learning from language feedback for AI agents, marking a foundational step in the field.

The paper tackles the problem of learning from language feedback by formalizing it as the Learning from Language Feedback (LLF) problem, introducing a complexity measure called transfer eluder dimension, and developing a no-regret algorithm, HELiX, with performance guarantees that scale with this dimension, showing cases where it can be exponentially faster than learning from reward.

Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. While impressive empirical demonstrations have been shown, so far a principled framing of these decision problems remains lacking. In this paper, we formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a complexity measure to characterize the hardness of LLF problems. We show that transfer eluder dimension captures the intuition that information in the feedback changes the learning complexity of the LLF problem. We demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension of the problem. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark a first step towards designing principled interactive learning algorithms from generic language feedback.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes