OCAIMay 9

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

arXiv:2605.0920996.8
AI Analysis

For researchers in bilevel optimization and hyperparameter tuning, this work provides a theoretical and algorithmic framework for handling non-isolated lower-level solution sets, which is a common but previously underexplored scenario.

This paper addresses optimistic bilevel optimization when the lower-level problem has a manifold of minimizers, showing that differentiability of the hyper-objective requires uniqueness of the optimistic selection rather than a singleton solution set. The proposed HG-MS method achieves convergence to a stationary point with complexity governed by the intrinsic dimension of the solution manifold and obtains state-of-the-art results on LLM source reweighting tasks, including best GSM8K/MATH scores across tested backbones.

We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--Łojasiewicz (PŁ) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive Hölder regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes