CLMay 29, 2021

CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model

arXiv:2105.14242v1715 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for better collaboration among developers by automating commit message generation, though it is incremental as it builds on existing neural machine translation approaches.

The authors tackled the problem of automatically generating commit messages from source code changes by releasing a 345K dataset across six programming languages and proposing two training methods to improve generation results, measured with BLEU-4 scores.

Commit message is a document that summarizes source code changes in natural language. A good commit message clearly shows the source code changes, so this enhances collaboration between developers. Therefore, our work is to develop a model that automatically writes the commit message. To this end, we release 345K datasets consisting of code modification and commit messages in six programming languages (Python, PHP, Go, Java, JavaScript, and Ruby). Similar to the neural machine translation (NMT) model, using our dataset, we feed the code modification to the encoder input and the commit message to the decoder input and measure the result of the generated commit message with BLEU-4. Also, we propose the following two training methods to improve the result of generating the commit message: (1) A method of preprocessing the input to feed the code modification to the encoder input. (2) A method that uses an initial weight suitable for the code domain to reduce the gap in contextual representation between programming language (PL) and natural language (NL). Training code, dataset, and pre-trained weights are available at https://github.com/graykode/commit-autosuggestions

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes