LGDSMar 27

Automatic feature identification in least-squares policy iteration using the Koopman operator framework

arXiv:2603.2646411.7h-index: 2
AI Analysis

This work addresses the feature selection bottleneck in reinforcement learning for control problems, offering an incremental improvement over existing methods.

The paper tackles the problem of lacking systematic feature selection in linear reinforcement learning by introducing a Koopman autoencoder-based least-squares policy iteration algorithm that automatically learns features, achieving comparable convergence to optimal policies with a reasonable number of features in stochastic chain walk and inverted pendulum control problems.

In this paper, we present a Koopman autoencoder-based least-squares policy iteration (KAE-LSPI) algorithm in reinforcement learning (RL). The KAE-LSPI algorithm is based on reformulating the so-called least-squares fixed-point approximation method in terms of extended dynamic mode decomposition (EDMD), thereby enabling automatic feature learning via the Koopman autoencoder (KAE) framework. The approach is motivated by the lack of a systematic choice of features or kernels in linear RL techniques. We compare the KAE-LSPI algorithm with two previous works, the classical least-squares policy iteration (LSPI) and the kernel-based least-squares policy iteration (KLSPI), using stochastic chain walk and inverted pendulum control problems as examples. Unlike previous works, no features or kernels need to be fixed a priori in our approach. Empirical results show the number of features learned by the KAE technique remains reasonable compared to those fixed in the classical LSPI algorithm. The convergence to an optimal or a near-optimal policy is also comparable to the other two methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes