SEAICLLGDec 5, 2024

Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

arXiv:2412.03815v23 citationsh-index: 8Has CodeACM Trans Softw Eng Methodol
Originality Incremental advance
AI Analysis

This addresses the challenge of making repository data accessible for software developers, though it is incremental as it builds on existing LLM and knowledge graph methods.

The study tackled the problem of low accuracy in LLM-based chatbots for answering software repository questions by augmenting them with knowledge graphs, achieving an accuracy of 84% with few-shot chain-of-thought prompting and outperforming baselines in user studies.

Software repositories contain valuable information for understanding the development process. However, extracting insights from repository data is time-consuming and requires technical expertise. While software engineering chatbots support natural language interactions with repositories, chatbots struggle to understand questions beyond their trained intents and to accurately retrieve the relevant data. This study aims to improve the accuracy of LLM-based chatbots in answering repository-related questions by augmenting them with knowledge graphs. We use a two-step approach: constructing a knowledge graph from repository data, and synergizing the knowledge graph with an LLM to handle natural language questions and answers. We curated 150 questions of varying complexity and evaluated the approach on five popular open-source projects. Our initial results revealed the limitations of the approach, with most errors due to the reasoning ability of the LLM. We therefore applied few-shot chain-of-thought prompting, which improved accuracy to 84%. We also compared against baselines (MSRBot and GPT-4o-search-preview), and our approach performed significantly better. In a task-based user study with 20 participants, users completed more tasks correctly and in less time with our approach, and they reported that it was useful. Our findings demonstrate that LLMs and knowledge graphs are a viable solution for making repository data accessible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes