CVMar 3

TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval

arXiv:2603.02929v25 citationsh-index: 18
Originality Highly original
AI Analysis

This work addresses the problem of universal multimodal retrieval for users with diverse intents, providing a more effective and efficient solution for applications that require complex query interpretation.

The authors tackled the problem of universal multimodal retrieval by introducing TRACE, which achieved state-of-the-art results and demonstrated optimal balance between retrieval accuracy and inference throughput, with remarkable zero-shot transferability to unseen domains. TRACE outperformed existing methods, establishing a new benchmark.

Universal Multimodal Retrieval requires unified embedding models capable of interpreting diverse user intents, ranging from simple keywords to complex compositional instructions. While Multimodal Large Language Models (MLLMs) possess strong reasoning capabilities, prevailing adaptations confine them to static encoders, underutilizing their generative potential. This encoder-only paradigm struggles with complex intents that demand logical deduction rather than superficial pattern matching. To address this, we introduce TRACE (Task-adaptive Reasoning And Compressing Embeddings). TRACE unifies generative reasoning with discriminative representation learning. It first generates a structured Chain-of-Thought (CoT) to explicitly reason about the query, and subsequently compresses this reasoning trace into a compact embedding via a dedicated token. To train this framework, we construct M-BEIR-CoT, a large-scale dataset featuring a difficulty-aware routing strategy. Experiments on the M-BEIR benchmark establish TRACE as the new state-of-the-art. Crucially, TRACE demonstrates a learned implicit routing behavior. It autonomously activates reasoning for complex queries while bypassing it for simpler ones, achieving an optimal balance between retrieval accuracy and inference throughput. Furthermore, by internalizing the deductive process, TRACE exhibits remarkable zero-shot transferability to unseen domains and novel constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes