AIJun 18, 2016

On Reward Function for Survival

arXiv:1606.05767v28 citations
Originality Synthesis-oriented
AI Analysis

This work addresses survival strategies for biological agents, but it appears incremental as it generalizes prior formulations and applies existing reinforcement learning methods.

The paper tackles the problem of formulating survival strategies for biological agents by maximizing multi-step survival probability, converting it into a reinforcement learning problem with a reward function based on log temporal survival probability, and empirically shows that agents learn survival behavior using this approach.

Obtaining a survival strategy (policy) is one of the fundamental problems of biological agents. In this paper, we generalize the formulation of previous research related to the survival of an agent and we formulate the survival problem as a maximization of the multi-step survival probability in future time steps. We introduce a method for converting the maximization of multi-step survival probability into a classical reinforcement learning problem. Using this conversion, the reward function (negative temporal cost function) is expressed as the log of the temporal survival probability. And we show that the objective function of the reinforcement learning in this sense is proportional to the variational lower bound of the original problem. Finally, We empirically demonstrate that the agent learns survival behavior by using the reward function introduced in this paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes