Spectral Representation-based Reinforcement Learning
This work addresses theoretical and practical issues in reinforcement learning for real-world applications with large spaces, offering a novel framework that could benefit researchers and practitioners, though it appears incremental as it builds on existing spectral decomposition concepts.
The paper tackles the challenges of reinforcement learning in large state-action spaces by introducing spectral representations derived from transition operators, which provide theoretical clarity and effective abstraction for policy optimization, achieving performance comparable or superior to state-of-the-art baselines on over 20 tasks from the DeepMind Control Suite.
In real-world applications with large state and action spaces, reinforcement learning (RL) typically employs function approximations to represent core components like the policies, value functions, and dynamics models. Although powerful approximations such as neural networks offer great expressiveness, they often present theoretical ambiguities, suffer from optimization instability and exploration difficulty, and incur substantial computational costs in practice. In this paper, we introduce the perspective of spectral representations as a solution to address these difficulties in RL. Stemming from the spectral decomposition of the transition operator, this framework yields an effective abstraction of the system dynamics for subsequent policy optimization while also providing a clear theoretical characterization. We reveal how to construct spectral representations for transition operators that possess latent variable structures or energy-based structures, which implies different learning methods to extract spectral representations from data. Notably, each of these learning methods realizes an effective RL algorithm under this framework. We also provably extend this spectral view to partially observable MDPs. Finally, we validate these algorithms on over 20 challenging tasks from the DeepMind Control Suite, where they achieve performances comparable or superior to current state-of-the-art model-free and model-based baselines.