Generation-Augmented Query Expansion For Code Retrieval
This addresses code retrieval for developers by enhancing query understanding, though it is incremental as it builds on existing pre-trained models.
The paper tackles the problem of code retrieval by proposing a generation-augmented query expansion framework that uses code generation models to augment documentation queries with generated code snippets, achieving new state-of-the-art results on the CodeSearchNet benchmark with significant improvements over baselines.
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.