Locality Sensitive Sparse Encoding for Learning World Models Online
This addresses the problem of data nonstationarity for lifelong agents in reinforcement learning, offering an incremental improvement over existing methods.
The paper tackles the challenge of catastrophic forgetting in online model-based reinforcement learning by proposing a linear regression model with nonlinear random features that achieves efficient Follow-The-Leader updates. The result shows that this world model, using a single pass of trajectory data, matches or surpasses the performance of deep world models trained with replay and continual learning methods.
Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.