LG AIDec 30, 2022

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

Wenqing Zheng, S P Sharan, Zhiwen Fan, Kevin Wang, Yihan Xi, Zhangyang Wang

arXiv:2212.14849v113.016 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of interpretability and scalability in visual RL for applications requiring efficient deployment, though it appears incremental by combining symbolic and neural approaches.

The paper tackles the problem of learning interpretable and efficient policies in visual reinforcement learning by proposing DiffSES, a framework that discovers discrete symbolic policies using object-level abstractions and differentiable optimization, resulting in simpler and more scalable policies than state-of-the-art symbolic RL methods.

Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.

View on arXiv PDF Code

Similar