SE AI DCNov 10, 2025

SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction

Wuyang Zhang, Chenkai Zhang, Zhen Luo, Jianming Ma, Wangming Yuan, Chuqiao Gu, Chenwei Feng

arXiv:2511.07584v15.97 citationsh-index: 1INNO-PRESS: Journal of Emerging Applied AI

Originality Highly original

AI Analysis

This addresses practical deployment issues in automated software development for developers, though it is incremental as it builds on existing LLM and knowledge graph techniques.

The paper tackles systematic errors in LLM-based code generation, such as logical and schematic hallucinations, by introducing SemanticForge, which integrates semantic knowledge graphs and constraint satisfaction to achieve 73% precision in query generation compared to 51% for traditional methods.

Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment. We identify two critical failure modes: \textit{logical hallucination} (incorrect control/data-flow reasoning) and \textit{schematic hallucination} (type mismatches, signature violations, and architectural inconsistencies). These errors stem from the absence of explicit, queryable representations of repository-wide semantics. This paper presents \textbf{SemanticForge}, which introduces four fundamental algorithmic advances for semantically-aware code generation: (1) a novel automatic reconciliation algorithm for dual static-dynamic knowledge graphs, unifying compile-time and runtime program semantics; (2) a neural approach that learns to generate structured graph queries from natural language, achieving 73\% precision versus 51\% for traditional retrieval; (3) a novel beam search algorithm with integrated SMT solving, enabling real-time constraint verification during generation rather than post-hoc validation; and (4) an incremental maintenance algorithm that updates knowledge graphs in $O(|ΔR| \cdot \log n)$ time while maintaining semantic equivalence.

View on arXiv PDF

Similar