SEAILGOct 14, 2023

Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

arXiv:2310.11467v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses code comment quality assessment for software developers, but it is incremental as it builds on existing classification methods with data augmentation.

The researchers tackled the problem of classifying binary code comment quality by augmenting a dataset of 9,048 C language code-comment pairs with AI-generated pairs, resulting in two classification models with improved accuracy.

This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes