AILGSep 30, 2021

Reinforcement Learning with Information-Theoretic Actuation

arXiv:2109.15147v11 citations
Originality Highly original
AI Analysis

This work provides a theoretical framework for integrating large sequence models into multi-task reinforcement learning, addressing a foundational issue in AI.

The paper tackles the problem of formalizing actions in reinforcement learning as internal choices rather than external outputs, and shows that this leads to self-consistent definitions of action value functions.

Reinforcement Learning formalises an embodied agent's interaction with the environment through observations, rewards and actions. But where do the actions come from? Actions are often considered to represent something external, such as the movement of a limb, a chess piece, or more generally, the output of an actuator. In this work we explore and formalize a contrasting view, namely that actions are best thought of as the output of a sequence of internal choices with respect to an action model. This view is particularly well-suited for leveraging the recent advances in large sequence models as prior knowledge for multi-task reinforcement learning problems. Our main contribution in this work is to show how to augment the standard MDP formalism with a sequential notion of internal action using information-theoretic techniques, and that this leads to self-consistent definitions of both internal and external action value functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes