AILGMay 4

Submodular Benchmark Selection

arXiv:2605.0220968.31 citations
AI Analysis

For researchers evaluating large language models, this provides a principled method to reduce evaluation cost by selecting a small, informative benchmark subset.

The paper formalizes benchmark subset selection as submodular maximization under a multivariate Gaussian model, showing that mutual information outperforms entropy for imputation on small subsets across three matrices from ten public leaderboards.

Evaluating large language models across many benchmarks is expensive, yet many benchmarks are highly correlated. We formalize the selection of a small, informative subset as submodular maximization under a multivariate Gaussian model. Entropy (log-determinant covariance) and mutual information between selected and remaining benchmarks arise as natural objectives. Both are submodular; entropy selection coincides with pivoted Cholesky and has spectral residual bounds, while mutual information is non-monotone in general but empirically monotone for small subsets, so we optimize it greedily. Experiments on three matrices from ten public leaderboards show that mutual information selection outperforms entropy for imputation at small subsets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes