CLSEDec 20, 2022

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

Stanford
arXiv:2212.10007v2114 citationsh-index: 98
Originality Incremental advance
AI Analysis

This addresses the issue of generating hallucinated or incorrect code in modular software development, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of code completion by incorporating cross-file context, which existing language models ignore, and achieves a 33.94% relative increase in exact match and a 28.69% relative increase in identifier matching.

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 33.94% relative increase in exact match and a 28.69% relative increase in identifier matching for code completion when the cross-file context is provided.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes