SECLJun 11, 2024

VersiCode: Towards Version-controllable Code Generation

arXiv:2406.07411v219 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a gap in deploying LLMs for realistic software development where libraries frequently update, though it is incremental as it builds on existing code generation research.

The paper tackles the problem of large language models (LLMs) failing to account for library updates in code generation, proposing two new tasks (version-specific code completion and version-aware code migration) and introducing VersiCode, a dataset and metric to evaluate LLMs on these tasks, finding it remains a significant challenge even for state-of-the-art models like GPT-4o.

Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development, marked by frequent library updates. This gap significantly limits LLMs' deployment in realistic settings. In this paper, we propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM). In conjunction, we introduce VersiCode, a comprehensive Python dataset specifically designed to evaluate LLMs on these two tasks, together with a novel evaluation metric, Critical Diff Check (CDC@1), which assesses code generation against evolving API requirements. We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge, even for GPT-4o and other strong frontier models. We believe the novel tasks, dataset, and metric open up a new, important research direction that will further enhance LLMs' real-world applicability. The code and resources can be found at https://github.com/wutong8023/VersiCode.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes