LGAIOct 16, 2024

Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces

arXiv:2410.12166v19 citationsh-index: 1ICLR
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing programmatic policies for researchers in reinforcement learning, offering an incremental improvement by demonstrating the advantages of programmatic over latent spaces.

The paper shows that programmatic spaces, derived directly from domain-specific languages without training, achieve similar behavior loss values as learned latent spaces and outperform existing systems like LEAPS and HPRL in search algorithms for programmatic policies in POMDPs.

Recent works have introduced LEAPS and HPRL, systems that learn latent spaces of domain-specific languages, which are used to define programmatic policies for partially observable Markov decision processes (POMDPs). These systems induce a latent space while optimizing losses such as the behavior loss, which aim to achieve locality in program behavior, meaning that vectors close in the latent space should correspond to similarly behaving programs. In this paper, we show that the programmatic space, induced by the domain-specific language and requiring no training, presents values for the behavior loss similar to those observed in latent spaces presented in previous work. Moreover, algorithms searching in the programmatic space significantly outperform those in LEAPS and HPRL. To explain our results, we measured the "friendliness" of the two spaces to local search algorithms. We discovered that algorithms are more likely to stop at local maxima when searching in the latent space than when searching in the programmatic space. This implies that the optimization topology of the programmatic space, induced by the reward function in conjunction with the neighborhood function, is more conducive to search than that of the latent space. This result provides an explanation for the superior performance in the programmatic space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes