MLMTRL-SCILGBMMay 3, 2024

Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design

arXiv:2405.02449v118 citationsh-index: 17ICML
AI Analysis

This addresses the collapse problem in experimental design for fields such as drug discovery and materials science, offering a flexible balance between quality and diversity, though it is incremental as it builds on existing Vendi scores.

The paper tackled the problem of experimental design algorithms favoring exploitation over exploration, which leads to getting stuck in local optima and prevents diverse high-quality data collection. They extended Vendi scores to account for quality, applied them to domains like drug discovery and reinforcement learning, and achieved a 70%-170% increase in effective discoveries compared to baselines.

Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes