Correctness-Guaranteed Code Generation via Constrained Decoding
This addresses the need for one-shot correctness in domains like video games and robotics, though it is incremental as it builds on existing constrained decoding and parsing methods.
The paper tackles the challenge of ensuring correctness in code generation by language models, presenting a constrained decoding algorithm that guarantees semantic correctness through a context-sensitive parser framework, and demonstrates it can generate semantically correct programs for a strongly typed Lua variant and runtime-correct game mechanics.
Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human oversight, domains such as video games and robotics require one-shot correctness for runtime-critical components. We present a constrained decoding algorithm for generating semantically correct programs that incorporates a context-sensitive parser, which, at each step, outputs a regular expression that satisfies a critical non-extensible property to guide the generation of the next token sequence that can continue to a correct program. To build such a context-sensitive parser, we propose a framework of a dynamic tree of parsers (ToP) during parsing, where each parser corresponds to a modular context-free grammar enriched with contextual information such as variable scopes and type constraints, with tree branches representing ambiguity in the future code segment. We demonstrate our approach through sLua, a strongly typed variant of Lua, showing that our method can generate semantically correct programs conforming to any prescribed scripting API. We further show that, with careful design, our semantic guarantees extend to runtime correctness, as validated in the application of generating game mechanics for a roguelike video game.