SEApr 22

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

Yuxuan Chen, Mingwei Liu, Guangsheng Ou, Anji Li, Dekun Dai, Yanlin Wang, Zibin Zheng

arXiv:2410.2224068.17 citationsh-index: 18

Predicted impact top 22% in SE · last 90 daysOriginality Incremental advance

AI Analysis

This provides a practical guide for developers and researchers on optimizing LLMs for code search, though it is incremental as it compares existing model types.

The paper systematically evaluated decoder-only large language models for code search, finding that fine-tuned models like CodeGemma outperform encoder-only models by 40.4% in Mean Average Precision on the CoSQA+ benchmark.

Code search is essential for code reuse, allowing developers to efficiently locate relevant code snippets. The advent of powerful decoder-only Large Language Models (LLMs) has revolutionized many code intelligence tasks. However, their effectiveness for the retrieval-based task of code search, particularly compared to established encoder-based models, remains underexplored. This paper addresses this gap by presenting a large-scale systematic evaluation of eleven decoder-only LLMs, analyzing their performance across zero-shot and fine-tuned settings. Our results show that fine-tuned decoder-only models, particularly CodeGemma, significantly outperform encoder-only models like UniXcoder, achieving a 40.4% higher Mean Average Precision (MAP) on the CoSQA$^+$ benchmark. Our analysis further reveals two crucial nuances for practitioners: first, the relationship between model size and performance is non-monotonic, with mid-sized models often outperforming larger variants; second, the composition of the training data is critical, as a multilingual dataset enhances generalization while a small amount of data from a specific language can act as noise and interfere with model effectiveness. These findings offer a comprehensive guide to selecting and optimizing modern LLMs for code search.

View on arXiv PDF

Similar