CLSEOct 28, 2024

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

arXiv:2410.21157v121 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited language coverage and coarse evaluation in code completion benchmarks for researchers and developers in software engineering, though it is incremental as it builds on existing benchmarks.

The authors tackled the lack of multilingual evaluation for repository-level code completion by introducing M2RC-EVAL, a benchmark covering 18 programming languages with fine-grained annotations, and M2RC-INSTRUCT, an instruction dataset, which improved code LLM performance as shown in comprehensive experiments.

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of languages (<5), which cannot evaluate the general code intelligence abilities across different languages for existing code Large Language Models (LLMs). Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored. Therefore, to facilitate the research of code LLMs in multilingual scenarios, we propose a massively multilingual repository-level code completion benchmark covering 18 programming languages (called M2RC-EVAL), and two types of fine-grained annotations (i.e., bucket-level and semantic-level) on different completion scenarios are provided, where we obtain these annotations based on the parsed abstract syntax tree. Moreover, we also curate a massively multilingual instruction corpora M2RC- INSTRUCT dataset to improve the repository-level code completion abilities of existing code LLMs. Comprehensive experimental results demonstrate the effectiveness of our M2RC-EVAL and M2RC-INSTRUCT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes