Ranked from Within: Ranking Large Multimodal Models Without Labels
This provides a practical, label-free method for selecting models in diverse domains, though it is incremental as it builds on existing uncertainty metrics.
The paper tackled the problem of ranking large multimodal models without labeled data by using uncertainty scores from softmax distributions, achieving robust and consistent rankings across 47 models and 9 visual question answering benchmarks.
Can the relative performance of a pre-trained large multimodal model (LMM) be predicted without access to labels? As LMMs proliferate, it becomes increasingly important to develop efficient ways to choose between them when faced with new data or tasks. The usual approach does the equivalent of giving the models an exam and marking them. We opt to avoid marking and the associated labor of determining the ground-truth answers. Instead, we explore other signals elicited and ascertain how well the models know their own limits, evaluating the effectiveness of these signals at unsupervised model ranking. We evaluate $47$ state-of-the-art LMMs (\eg, LLaVA) across $9$ visual question answering benchmarks, analyzing how well uncertainty-based metrics can predict relative model performance. Our findings show that uncertainty scores derived from softmax distributions provide a robust and consistent basis for ranking models across various tasks. This facilitates the ranking of LMMs on unlabeled data, providing a practical approach for selecting models for diverse target domains without requiring manual annotation.