LGNEJan 29

READY: Reward Discovery for Meta-Black-Box Optimization

arXiv:2601.21847v1h-index: 10
Originality Incremental advance
AI Analysis

This addresses reward design issues for researchers in optimization and reinforcement learning, but it appears incremental as it builds on existing MetaBBO methods with a new automation approach.

The paper tackles the problem of human-designed reward functions in Meta-Black-Box Optimization (MetaBBO), which can introduce bias and reward hacking, by using Large Language Models (LLMs) as an automated reward discovery tool; it demonstrates that the discovered reward functions can boost existing MetaBBO works, though no concrete numbers are provided in the abstract.

Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes