SE AI CL LGOct 14, 2023

A study of the impact of generative AI-based data augmentation on software metadata classification

Tripti Kumari, Chakali Sai Charan, Ayan Das

arXiv:2310.13714v11.7h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving software metadata classification for developers and researchers, but it is incremental as it builds on existing methods with a specific dataset.

The study tackled the problem of predicting the usefulness of code-comment pairs in software metadata classification by developing a machine learning model using neural contextual representations, and it resulted in a 4% increase in F1-score from the baseline when incorporating LLM-generated data.

This paper presents the system submitted by the team from IIT(ISM) Dhanbad in FIRE IRSE 2023 shared task 1 on the automatic usefulness prediction of code-comment pairs as well as the impact of Large Language Model(LLM) generated data on original base data towards an associated source code. We have developed a framework where we train a machine learning-based model using the neural contextual representations of the comments and their corresponding codes to predict the usefulness of code-comments pair and performance analysis with LLM-generated data with base data. In the official assessment, our system achieves a 4% increase in F1-score from baseline and the quality of generated data.

View on arXiv PDF

Similar