LGApr 2, 2024

Predicting the Performance of Foundation Models via Agreement-on-the-Line

arXiv:2404.01542v25 citationsh-index: 26NIPS
Originality Incremental advance
AI Analysis

This addresses a critical deployment challenge for foundation models in real-world applications where labels are limited, though it is incremental as it builds on existing agreement-on-the-line methods.

The paper tackles the problem of predicting out-of-distribution performance for foundation models with scarce labels by leveraging agreement-on-the-line phenomena, finding that random head initialization reliably induces this in ensembles across vision and language benchmarks, enabling high-precision predictions.

Estimating the out-of-distribution performance in regimes where labels are scarce is critical to safely deploy foundation models. Recently, it was shown that ensembles of neural networks observe the phenomena "agreement-on-the-line", which can be leveraged to reliably predict OOD performance without labels. However, in contrast to classical neural networks that are trained on in-distribution data from scratch for numerous epochs, foundation models undergo minimal finetuning from heavily pretrained weights, which may reduce the ensemble diversity needed to observe agreement-on-the-line. In our work, we demonstrate that when lightly finetuning multiple runs from a single foundation model, the choice of randomness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. Surprisingly, only random head initialization is able to reliably induce agreement-on-the-line in finetuned foundation models across vision and language benchmarks. Second, we demonstrate that ensembles of multiple foundation models pretrained on different datasets but finetuned on the same task can also show agreement-on-the-line. In total, by careful construction of a diverse ensemble, we can utilize agreement-on-the-line-based methods to predict the OOD performance of foundation models with high precision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes