Vector Graph-Based Repository Understanding for Issue-Driven File Retrieval
This addresses the challenge of navigating and automating development in complex software projects for developers, though it appears incremental as it builds on existing graph and retrieval techniques.
The paper tackles the problem of understanding large software repositories by converting them into a vectorized knowledge graph that captures architectural and semantic structure, resulting in a system that automates repository development through hybrid retrieval and LLM-based assistance.
We present a repository decomposition system that converts large software repositories into a vectorized knowledge graph which mirrors project architectural and semantic structure, capturing semantic relationships and allowing a significant level of automatization of further repository development. The graph encodes syntactic relations such as containment, implementation, references, calls, and inheritance, and augments nodes with LLM-derived summaries and vector embeddings. A hybrid retrieval pipeline combines semantic retrieval with graph-aware expansion, and an LLM-based assistant formulates constrained, read-only graph requests and produces human-oriented explanations.