LGCLIROct 25, 2024

Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements

Apple
arXiv:2410.19974v11 citationsh-index: 10MMSR@CIKM
Originality Synthesis-oriented
AI Analysis

This work provides insights into cost-accuracy trade-offs for selecting models in multimodal search applications, but it is incremental as it builds on existing evaluation methods without introducing new paradigms.

The study evaluated LLMs and MLLMs for aligning with human judgments in multimodal search relevance, finding that model performance varies by context and visual components can hinder smaller models.

Large Language Models (LLMs) have demonstrated potential as effective search relevance evaluators. However, there is a lack of comprehensive guidance on which models consistently perform optimally across various contexts or within specific use cases. In this paper, we assess several LLMs and Multimodal Language Models (MLLMs) in terms of their alignment with human judgments across multiple multimodal search scenarios. Our analysis investigates the trade-offs between cost and accuracy, highlighting that model performance varies significantly depending on the context. Interestingly, in smaller models, the inclusion of a visual component may hinder performance rather than enhance it. These findings highlight the complexities involved in selecting the most appropriate model for practical applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes