IR AI CL LGJul 5, 2025

A Comparative Study of Specialized LLMs as Dense Retrievers

arXiv:2507.03958v28.51 citationsh-index: 7CCIR

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of optimizing LLMs for unified retrieval across text, code, and multimodal content, though it is incremental as it systematically compares existing specialized models without introducing new methods.

This study investigated how domain-specific adaptations in large language models (LLMs) affect their retrieval performance, finding that mathematical specialization and long reasoning degrade retrieval while vision-language and code-specialized models excel, with code-specialized LLMs surpassing BM25 on code retrieval tasks.

While large language models (LLMs) are increasingly deployed as dense retrievers, the impact of their domain-specific specialization on retrieval effectiveness remains underexplored. This investigation systematically examines how task-specific adaptations in LLMs influence their retrieval capabilities, an essential step toward developing unified retrievers capable of handling text, code, images, and multimodal content. We conduct extensive experiments with eight Qwen2.5 7B LLMs, including base, instruction-tuned, code/math-specialized, long reasoning, and vision-language models across zero-shot retrieval settings and the supervised setting. For the zero-shot retrieval settings, we consider text retrieval from the BEIR benchmark and code retrieval from the CoIR benchmark. Further, to evaluate supervised performance, all LLMs are fine-tuned on the MS MARCO dataset. We find that mathematical specialization and the long reasoning capability cause consistent degradation in three settings, indicating conflicts between mathematical reasoning and semantic matching. The vision-language model and code-specialized LLMs demonstrate superior zero-shot performance compared to other LLMs, even surpassing BM25 on the code retrieval task, and maintain comparable performance to base LLMs in supervised settings. These findings suggest promising directions for the unified retrieval task leveraging cross-domain and cross-modal fusion.

View on arXiv PDF

Similar