Semantic Source Code Search: A Study of the Past and a Glimpse at the Future
This is an incremental study that addresses the problem of inefficient code search for developers dealing with large and complex codebases.
The paper reviews existing methods for building source code search engines and identifies their limitations in handling high-level natural language queries, while outlining open research directions and obstacles toward a universal solution.
With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically. Unfortunately, existing information retrieval-based methods fail to capture the query semantics and perform well only when the query contains syntax-based keywords. Consequently, such methods will perform poorly when given high-level natural language queries. In this paper, we review existing methods for building code search engines. We also outline the open research directions and the various obstacles that stand in the way of having a universal source code search engine.