LG AIOct 12, 2021

Action-Sufficient State Representation Learning for Control with Structural Constraints

Biwei Huang, Chaochao Lu, Liu Leqi, José Miguel Hernández-Lobato, Clark Glymour, Bernhard Schölkopf, Kun Zhang

arXiv:2110.05721v214.643 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of computational efficiency and generalization in decision-making tasks for AI systems in real-world scenarios, representing an incremental advancement in representation learning for control.

The paper tackles the problem of high-dimensional and noisy signals in partially observable environments by learning minimal state representations that contain sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs), and demonstrates improved policy learning efficiency and generalization in CarRacing and VizDoom.

Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed \textit{Action-Sufficient state Representations} (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.

View on arXiv PDF

Similar