Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets
This addresses a natural use case for developers needing bugfixing instructions and code samples, though it is incremental as it builds on existing code search tasks.
The authors tackled the problem of searching for code using code snippets as queries, which was not covered by existing datasets, and introduced the SearchBySnippet dataset and SnippeR model, achieving a Recall@10 of 0.451 and outperforming baselines.
Code search is an important and well-studied task, but it usually means searching for code by a text query. We argue that using a code snippet (and possibly an error traceback) as a query while looking for bugfixing instructions and code samples is a natural use case not covered by prior art. Moreover, existing datasets use code comments rather than full-text descriptions as text, making them unsuitable for this use case. We present a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; we show that on SearchBySnippet, existing architectures fall short of a simple BM25 baseline even after fine-tuning. We present a new single encoder model SnippeR that outperforms several strong baselines on SearchBySnippet with a result of 0.451 Recall@10; we propose the SearchBySnippet dataset and SnippeR as a new important benchmark for code search evaluation.