LGDec 27, 2025

GLUE: Gradient-free Learning to Unify Experts

arXiv:2512.22467v2h-index: 34
Originality Incremental advance
AI Analysis

This addresses the need for efficient domain adaptation in systems like multilingual ASR and cross-hospital imaging, offering a computationally cheaper alternative to backpropagation-based methods.

The paper tackles the problem of domain expansion by blending multiple pretrained specialist models into a single strong initialization for a target model, proposing GLUE which uses gradient-free optimization to learn mixture coefficients, resulting in up to 9.1% improvement in test accuracy over baselines.

In many deployed systems (multilingual ASR, cross-hospital imaging, region-specific perception), multiple pretrained specialist models coexist. Yet, new target domains often require domain expansion: a generalized model that performs well beyond any single specialist's domain. Given a new target domain, existing methods obtain a single strong initialization prior for the model parameters by blending expert models to initialize a target model. However, heuristic blending -- using mixing coefficients based on data size or proxy metrics -- often yields lower target-domain test accuracy, and learning these coefficients on the target domain's loss function typically requires computationally-expensive full backpropagation through a neural network. We propose GLUE, Gradient-free Learning to Unify Experts, which initializes the target model as a convex combination of fixed experts and learns the mixture coefficients of this combination via gradient-free two-point SPSA (simultaneous perturbation stochastic approximation) updates, requiring only two forward passes per step. Across experiments on three datasets and three network architectures, GLUE produces model parameter priors that can be fine-tuned to outperform baselines. GLUE improves test accuracy by up to 8.5% over data-size weighting and by up to 9.1% over proxy-metric selection. GLUE either outperforms backpropagation-based full-gradient mixing or matches its performance within 1.4%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes