CLSEMar 8, 2024

CommitBench: A Benchmark for Commit Message Generation

arXiv:2403.05188v113 citationsh-index: 46Has CodeSANER
Originality Synthesis-oriented
AI Analysis

This work addresses the tedious task of writing commit messages for software developers, providing a high-quality benchmark to accelerate research in this area, though it is incremental as it builds on existing methods with improved data.

The authors tackled the problem of automating commit message generation by creating CommitBench, a large-scale benchmark dataset that addresses issues in existing datasets, and found that a Transformer model pretrained on source code outperformed other approaches.

Writing commit messages is a tedious daily task for many software developers, and often remains neglected. Automating this task has the potential to save time while ensuring that messages are informative. A high-quality dataset and an objective benchmark are vital preconditions for solid research and evaluation towards this goal. We show that existing datasets exhibit various problems, such as the quality of the commit selection, small sample sizes, duplicates, privacy issues, and missing licenses for redistribution. This can lead to unusable models and skewed evaluations, where inferior models achieve higher evaluation scores due to biases in the data. We compile a new large-scale dataset, CommitBench, adopting best practices for dataset creation. We sample commits from diverse projects with licenses that permit redistribution and apply our filtering and dataset enhancements to improve the quality of generated commit messages. We use CommitBench to compare existing models and show that other approaches are outperformed by a Transformer model pretrained on source code. We hope to accelerate future research by publishing the source code( https://github.com/Maxscha/commitbench ).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes