A Convolution and Attention Based Encoder for Reinforcement Learning under Partial Observability
This work addresses the problem of incomplete state information for reinforcement learning agents, with incremental improvements in scalability under uncertainty.
The paper tackles the challenge of partial observability in reinforcement learning by reformulating POMDPs as fully observable processes using fixed-length observation histories, and proposes a lightweight temporal encoder based on depthwise separable convolution and self-attention, achieving superior performance on continuous control benchmarks.
Partially Observable Markov Decision Processes (POMDPs) remain a core challenge in reinforcement learning due to incomplete state information. We address this by reformulating POMDPs as fully observable processes with fixed-length observation histories as augmented states. To efficiently encode these histories, we propose a lightweight temporal encoder based on depthwise separable convolution and self-attention, avoiding the overhead of recurrent and Transformer-based models. Integrated into an actor-critic framework, our method achieves superior performance on continuous control benchmarks under partial observability. More broadly, this work shows that lightweight temporal encoding can improve the scalability of AI systems under uncertainty. It advances the development of agents capable of reasoning robustly in real-world environments where information is incomplete or delayed.