Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023
This work addresses the challenge of improving software metadata evaluation for software engineers and researchers, but it is incremental as it builds on existing machine learning frameworks and datasets.
The paper tackled the problem of automated evaluation of code comments by organizing a track with a binary classification task to classify comments as useful or not, using a dataset of 9048 code comment and code snippet pairs from open-source projects and additional LLM-generated data, resulting in 56 experiments submitted by 17 teams evaluated with F1-Scores and qualitative analysis.
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source github C based projects and an additional dataset generated individually by teams using large language models. Overall 56 experiments have been submitted by 17 teams from various universities and software companies. The submissions have been evaluated quantitatively using the F1-Score and qualitatively based on the type of features developed, the supervised learning model used and their corresponding hyper-parameters. The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.