LGIRSEMar 7, 2025

LoRACode: LoRA Adapters for Code Embeddings

arXiv:2503.05315v2h-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses scalability and cost issues in code retrieval for developers and researchers, though it is incremental as it applies an existing adaptation technique to a specific domain.

The paper tackles the problem of inefficient and costly code embeddings for semantic code search by introducing a parameter-efficient fine-tuning method using LoRA adapters, achieving up to 9.1% improvement in Code2Code search and 86.69% in Text2Code search while reducing trainable parameters to less than 2% of the base model.

Code embeddings are essential for semantic code search; however, current approaches often struggle to capture the precise syntactic and contextual nuances inherent in code. Open-source models such as CodeBERT and UniXcoder exhibit limitations in scalability and efficiency, while high-performing proprietary systems impose substantial computational costs. We introduce a parameter-efficient fine-tuning method based on Low-Rank Adaptation (LoRA) to construct task-specific adapters for code retrieval. Our approach reduces the number of trainable parameters to less than two percent of the base model, enabling rapid fine-tuning on extensive code corpora (2 million samples in 25 minutes on two H100 GPUs). Experiments demonstrate an increase of up to 9.1% in Mean Reciprocal Rank (MRR) for Code2Code search, and up to 86.69% for Text2Code search tasks across multiple programming languages. Distinction in task-wise and language-wise adaptation helps explore the sensitivity of code retrieval for syntactical and linguistic variations. To foster research in this area, we make our code and pre-trained models publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes