LGAIDec 29, 2022

On the Geometry of Reinforcement Learning in Continuous State and Action Spaces

arXiv:2301.00009v2h-index: 50
Originality Highly original
AI Analysis

This provides a foundational geometric insight for RL theory in continuous settings, addressing a gap for researchers and practitioners.

The paper tackles the lack of theoretical understanding in reinforcement learning for continuous state and action spaces by proving that transition dynamics induce a low-dimensional manifold, with dimensionality at most the action space plus one, and empirically validates this in MuJoCo environments while showing competitive policy performance.

Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens. Central to our work is the idea that the transition dynamics induce a low dimensional manifold of reachable states embedded in the high-dimensional nominal state space. We prove that, under certain conditions, the dimensionality of this manifold is at most the dimensionality of the action space plus one. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments. We further demonstrate the applicability of our result by learning a policy in this low dimensional representation. To do so we introduce an algorithm that learns a mapping to a low dimensional representation, as a narrow hidden layer of a deep neural network, in tandem with the policy using DDPG. Our experiments show that a policy learnt this way perform on par or better for four MuJoCo control suite tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes