SEAIOct 13, 2023

A ML-LLM pairing for better code comment classification

arXiv:2310.10275v12 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses code comment classification for software engineering, but it is incremental as it builds on existing methods with minor improvements.

The paper tackled the code comment classification task by evaluating classical machine learning systems and using LLM-generated data to boost performance, achieving a Macro-F1 score of 88.401% and a 1.5% performance increase with the best model.

The "Information Retrieval in Software Engineering (IRSE)" at FIRE 2023 shared task introduces code comment classification, a challenging task that pairs a code snippet with a comment that should be evaluated as either useful or not useful to the understanding of the relevant code. We answer the code comment classification shared task challenge by providing a two-fold evaluation: from an algorithmic perspective, we compare the performance of classical machine learning systems and complement our evaluations from a data-driven perspective by generating additional data with the help of large language model (LLM) prompting to measure the potential increase in performance. Our best model, which took second place in the shared task, is a Neural Network with a Macro-F1 score of 88.401% on the provided seed data and a 1.5% overall increase in performance on the data generated by the LLM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes