SE LGDec 18, 2025

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, Jiajun Cao, Shihab Rashid, Christian Bock

CMU

arXiv:2512.16956v23.4h-index: 12

Originality Incremental advance

AI Analysis

This addresses the challenge of improving retrieval accuracy for LLM-based coding agents in large codebases, though it appears incremental as it builds on existing dense embedding methods by adding graph-based features.

The paper tackled the problem of retrieving relevant code units for software issue localization by proposing SpIDER, an enhanced dense retrieval approach that integrates LLM-based reasoning with graph-based exploration of codebases, resulting in at least 13% performance improvement across programming languages and benchmarks.

Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify semantically relevant units. While embedding-based approaches can outperform BM25 by large margins, they often don't take into consideration the underlying graph-structured characteristics of the codebase. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that integrates LLM-based reasoning along with auxiliary information obtained from graph-based exploration of the codebase. We further introduce SpIDER-Bench, a graph-structured evaluation benchmark curated from SWE-PolyBench, SWEBench-Verified and Multi-SWEBench, spanning codebases from Python, Java, JavaScript and TypeScript programming languages. Empirical results show that SpIDER consistently improves dense retrieval performance by at least 13% across programming languages and benchmarks in SpIDER-Bench.

View on arXiv PDF

Similar