LGOct 22, 2025

An Integrated Approach to Neural Architecture Search for Deep Q-Networks

arXiv:2510.19872v1h-index: 7
Originality Highly original
AI Analysis

This work addresses the need for more efficient and adaptive deep reinforcement learning agents, offering a novel approach that could reduce reliance on static, offline architecture design.

The paper tackled the problem of neural network architecture constraints in deep reinforcement learning by introducing NAS-DQN, an agent that dynamically reconfigures architectures during training, achieving superior final performance, sample efficiency, and policy stability with negligible computational overhead compared to fixed baselines.

The performance of deep reinforcement learning agents is fundamentally constrained by their neural network architecture, a choice traditionally made through expensive hyperparameter searches and then fixed throughout training. This work investigates whether online, adaptive architecture optimization can escape this constraint and outperform static designs. We introduce NAS-DQN, an agent that integrates a learned neural architecture search controller directly into the DRL training loop, enabling dynamic network reconfiguration based on cumulative performance feedback. We evaluate NAS-DQN against three fixed-architecture baselines and a random search control on a continuous control task, conducting experiments over multiple random seeds. Our results demonstrate that NAS-DQN achieves superior final performance, sample efficiency, and policy stability while incurring negligible computational overhead. Critically, the learned search strategy substantially outperforms both undirected random architecture exploration and poorly-chosen fixed designs, indicating that intelligent, performance-guided search is the key mechanism driving success. These findings establish that architecture adaptation is not merely beneficial but necessary for optimal sample efficiency in online deep reinforcement learning, and suggest that the design of RL agents need not be a static offline choice but can instead be seamlessly integrated as a dynamic component of the learning process itself.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes