LGDec 30, 2021

Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates

arXiv:2112.14893v14 citations
Originality Incremental advance
AI Analysis

This addresses the need for diverse candidate generation in applications like drug discovery, representing an incremental improvement over existing bandit algorithms focused solely on maximizing reward.

The paper tackled the problem of searching for a diverse set of high-reward candidates in multi-armed bandit scenarios, such as drug discovery, by proposing a reversible upper confidence bound (rUCB) algorithm. The result showed that rUCB greatly reduces query times while achieving high accuracy and low performance loss in virtual screening on intrinsically disordered proteins.

Most algorithms for the multi-armed bandit problem in reinforcement learning aimed to maximize the expected reward, which are thus useful in searching the optimized candidate with the highest reward (function value) for diverse applications (e.g., AlphaGo). However, in some typical application scenaios such as drug discovery, the aim is to search a diverse set of candidates with high reward. Here we propose a reversible upper confidence bound (rUCB) algorithm for such a purpose, and demonstrate its application in virtual screening upon intrinsically disordered proteins (IDPs). It is shown that rUCB greatly reduces the query times while achieving both high accuracy and low performance loss.The rUCB may have potential application in multipoint optimization and other reinforcement-learning cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes