LGAIMLOct 8, 2020

Maximum Reward Formulation In Reinforcement Learning

arXiv:2010.03744v218 citations
Originality Highly original
AI Analysis

It addresses a real-world bottleneck in RL for domains such as drug discovery, where incremental improvements in formulation are needed.

The paper tackles the problem of reinforcement learning (RL) in applications like drug discovery, where maximizing the expected maximum reward along a trajectory is more relevant than cumulative return, and achieves state-of-the-art results on molecule generation tasks.

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not need to optimize for the expected cumulative return. In this work, we formulate an objective function to maximize the expected maximum reward along a trajectory, derive a novel functional form of the Bellman equation, introduce the corresponding Bellman operators, and provide a proof of convergence. Using this formulation, we achieve state-of-the-art results on the task of molecule generation that mimics a real-world drug discovery pipeline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes