SEMay 4

CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation

arXiv:2605.0225658.4Has Code
AI Analysis

For software engineering researchers, this provides a large-scale, semantically annotated benchmark for structured commit understanding and evaluation, addressing the lack of such resources.

CommitSuite introduces a benchmark of 63,533 CCS-compliant commits from 243 repositories across 7 languages, with AST-level code changes and LLM-assisted semantic annotations. It proposes a reference-free evaluation framework achieving 0.849 Cohen's Kappa agreement with human judgments, enabling reproducible research on commit classification and message generation.

High-quality commit messages are critical for maintaining software projects, yet ensuring their consistency and informativeness remains a practical challenge. While the Conventional Commits Specification (CCS) provides a structured format for commit messages, research on CCS-based commit classification and commit message generation (CMG) is limited by the absence of large-scale benchmarks, semantic annotations, and reliable evaluation methods. In this paper, we introduce CommitSuite, a benchmark comprising 63,533 CCS-compliant commits from 243 open-source repositories across seven programming languages. Each commit is labeled with its CCS type and enriched with AST-level code changes, along with LLM-assisted semantic annotations that capture the "what" and "why" behind the change. To evaluate CMG systems, we propose a reference-free framework based on five binary metrics: rationality, comprehensiveness, non-redundancy, authenticity, and logicality, enabling semantic-level assessment without relying on human-written references. Our experiments show that LLMs can effectively support both generation and evaluation, with evaluation achieving 0.849 Cohen's Kappa agreement against human judgments. CommitSuite offers a unified resource for structured commit understanding and facilitates reproducible research on commit classification and generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes