SYAILGOct 28, 2021

Cooperative Deep $Q$-learning Framework for Environments Providing Image Feedback

arXiv:2110.15305v1
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in deep reinforcement learning for environments with image feedback, but it appears incremental as it builds on existing methods with specific improvements.

The paper tackles sample inefficiency and slow learning in deep reinforcement learning by proposing a dual neural network approach with a temporal difference error-driven learning method, achieving faster convergence and reduced buffer size in simulations.

In this paper, we address two key challenges in deep reinforcement learning setting, sample inefficiency and slow learning, with a dual NN-driven learning approach. In the proposed approach, we use two deep NNs with independent initialization to robustly approximate the action-value function in the presence of image inputs. In particular, we develop a temporal difference (TD) error-driven learning approach, where we introduce a set of linear transformations of the TD error to directly update the parameters of each layer in the deep NN. We demonstrate theoretically that the cost minimized by the error-driven learning (EDL) regime is an approximation of the empirical cost and the approximation error reduces as learning progresses, irrespective of the size of the network. Using simulation analysis, we show that the proposed methods enables faster learning and convergence and requires reduced buffer size (thereby increasing the sample efficiency).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes