AI LGApr 28, 2022

Bilinear value networks

arXiv:2204.13695v311.210 citationsh-index: 40Has Code

Originality Incremental advance

AI Analysis

This work addresses data efficiency and generalization issues for researchers and practitioners in multi-goal reinforcement learning, representing an incremental improvement over existing methods.

The paper tackles data efficiency and generalization in off-policy multi-goal reinforcement learning by proposing a bilinear decomposition of the Q-value function, which improves data efficiency and shows superior transfer to out-of-distribution goals compared to prior methods, as demonstrated on simulated Fetch robot and Shadow hand tasks.

The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, φ(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.

View on arXiv PDF Code

Similar