Mehedi Sun

8.9SEApr 11Code

Fine-grained Multi-Document Extraction and Generation of Code Change Rationale

Mehedi Sun, Antu Saha, Nadeeshan De Silva et al.

Understanding the reasons behind past code changes is critical for many software engineering tasks, including refactoring and reviewing code, diagnosing bugs, and implementing new features. Unfortunately, locating and reconstructing this rationale can be difficult for developers because the information is often fragmented, inconsistently documented, and scattered across different artifacts such as commit messages, issue reports, and PRs. In this paper, we address this challenge in two steps. First, we conduct an empirical study of 63 commits from five open-source Java projects to analyze how rationale components (e.g., a change's goal, need, and alternative) are distributed across artifacts. We find that the rationale is highly fragmented: commit messages and pull requests primarily capture goals, while needs and alternatives are more often found in issues and PRs. Other components are scarce but found in artifacts other than commit messages. No single artifact type captures all components, underscoring the need for cross-document reasoning and synthesis. Second, we introduce ARGUS, an LLM-based approach that identifies sentences expressing goal, need, and alternative across a commit's artifacts and creates concise rationale summaries to support code comprehension and maintenance tasks. We evaluated ARGUS on the 63 commits and compared its performance against baseline variants. The best-performing version achieved 51.4% precision and 93.2% recall for rationale identification, while producing rationale summaries rated as accurate. A user study with 12 Java developers further showed that these summaries were perceived as useful and helpful for tasks such as code review, documentation, and debugging. Our results highlight the need for multi-document reasoning in capturing rationale and demonstrate the potential of ARGUS to help developers understand and maintain software systems.

9.0SEMar 23

Evaluating Language Model Applications for Identifying Solution-Related Content in Issue Report Discussions

Antu Saha, Mehedi Sun, Oscar Chaparro

During issue resolution, software developers rely on issue reports to discuss solutions for defects, feature requests, and other changes. These discussions contain proposed solutions--from design changes to code implementations--as well as their evaluations. Locating solution-related content is essential for investigating reopened issues, addressing regressions, reusing solutions, and understanding code change rationale. Manually understanding long discussions to identify such content can be difficult and time-consuming. This paper automates solution identification using language models as supervised classifiers. We investigate three applications--embeddings, prompting, and fine-tuning--across three classifier types: traditional ML models (MLMs), pre-trained language models (PLMs), and large language models (LLMs). Using 356 Mozilla Firefox issues, we created a dataset to train and evaluate six MLMs, four PLMs, and two LLMs across 68 configurations. Results show that MLMs with LLM embeddings outperform TF-IDF features, prompting underperforms, and fine-tuned LLMs achieve the highest performance, with LLAMAft reaching 0.716 F1 score. Ensembles of the best models further improve results (0.737 F1). Misclassifications often arise from misleading clues or missing context, highlighting the need for context-aware classifiers. Models trained on Mozilla transfer to other projects, with a small amount of project-specific data, further enhancing results. This work supports software maintenance, issue understanding, and solution reuse.

Mehedi Sun

2 Papers