CLLGSEAug 23, 2021

ComSum: Commit Messages Summarization and Meaning Preservation

arXiv:2108.10763v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better summarization tools for software developers, but it is incremental as it focuses on dataset creation and a new evaluation metric.

The authors tackled the problem of summarizing commit messages in software development by creating ComSum, a dataset of 7 million commit messages, and proposed evaluating summaries not just by ROUGE scores but also by meaning preservation.

We present ComSum, a data set of 7 million commit messages for text summarization. When documenting commits, software code changes, both a message and its summary are posted. We gather and filter those to curate developers' work summarization data set. Along with its growing size, practicality and challenging language domain, the data set benefits from the living field of empirical software engineering. As commits follow a typology, we propose to not only evaluate outputs by Rouge, but by their meaning preservation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes