LGJan 7

Survival Dynamics of Neural and Programmatic Policies in Evolutionary Reinforcement Learning

arXiv:2601.04365v1Has Code

Originality Incremental advance

AI Analysis

This addresses the limitation of interpretability in neural policies for evolutionary reinforcement learning, though it is incremental as it builds on existing testbeds.

The study tackled the problem of comparing programmatic policies (PERL) to neural policies (NERL) in evolutionary reinforcement learning, finding that PERL agents survived on average 201.69 steps longer than NERL agents and even outperformed neural agents using both learning and evolution by 73.67 steps.

In evolutionary reinforcement learning tasks (ERL), agent policies are often encoded as small artificial neural networks (NERL). Such representations lack explicit modular structure, limiting behavioral interpretation. We investigate whether programmatic policies (PERL), implemented as soft, differentiable decision lists (SDDL), can match the performance of NERL. To support reproducible evaluation, we provide the first fully specified and open-source reimplementation of the classic 1992 Artificial Life (ALife) ERL testbed. We conduct a rigorous survival analysis across 4000 independent trials utilizing Kaplan-Meier curves and Restricted Mean Survival Time (RMST) metrics absent in the original study. We find a statistically significant difference in survival probability between PERL and NERL. PERL agents survive on average 201.69 steps longer than NERL agents. Moreover, SDDL agents using learning alone (no evolution) survive on average 73.67 steps longer than neural agents using both learning and evaluation. These results demonstrate that programmatic policies can exceed the survival performance of neural policies in ALife.

View on arXiv PDF

Similar