SEAIIRLGNov 24, 2020

Search4Code: Code Search Intent Classification Using Weak Supervision

arXiv:2011.11950v33 citations
Originality Incremental advance
AI Analysis

This work provides a large-scale, real-world dataset and a method for classifying code search intent, which is valuable for improving code search tools for developers.

This paper addresses the challenge of classifying code search intent in web search queries for C# and Java. The authors propose a weak supervision approach and demonstrate that a CNN-based model achieves 77% accuracy for C# and 76% for Java on a dataset of over 1 million real-world queries.

Developers use search for various tasks such as finding code, documentation, debugging information, etc. In particular, web search is heavily used by developers for finding code examples and snippets during the coding process. Recently, natural language based code search has been an active area of research. However, the lack of real-world large-scale datasets is a significant bottleneck. In this work, we propose a weak supervision based approach for detecting code search intent in search queries for C# and Java programming languages. We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively. Furthermore, we are also releasing Search4Code, the first large-scale real-world dataset of code search queries mined from Bing web search engine. We hope that the dataset will aid future research on code search.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes