Improved Query Reformulation for Concept Location using CodeRank and Document Structures
This addresses a specific bottleneck for software developers in concept location tasks, offering a domain-specific incremental improvement.
The paper tackles the problem of poor query formulation in software maintenance by proposing ACER, a technique that uses CodeRank and document structures to reformulate queries, improving 71% of baseline queries across eight systems.
During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.