Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation
For developers relying on LLM code generation, this work identifies and mitigates a failure mode where semantically equivalent prompts produce different code, improving reliability.
The paper investigates syntactic robustness in LLM-based code generation when prompts contain mathematical formulas, finding that LLMs are not robust to semantics-preserving syntactic changes. They propose a pre-processing step that improves syntactic robustness from 54.05% to 74.42%.
Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces code that implements the requirements specified in the prompt. Many software requirements include mathematical formulas that specify the expected behavior of the code to be generated. Given a code generation prompt that contains a mathematical formula, a reasonable expectation is that, if the formula is syntactically modified without changing its semantics, the generated code for the modified prompt should be semantically equivalent. We formalize this concept as syntactic robustness and investigate the syntactic robustness of LLMs as code generators. Our experimental assessment demonstrates that LLMs are not syntactically robust for code generation prompts with formulas, especially for the ones that require mathematical reasoning. We investigate attack strategies that can further deteriorate the syntactic robustness of LLMs. Finally, to mitigate syntactic robustness failures in LLMs, we propose a pre-processing step that uses reductions to transform formulas in prompts to a simplified form. Our experimental results demonstrate that the syntactic robustness of LLM-based code generation improves significantly using our approach, improving syntactic robustness of LLMs from 54.05% to 74.42%.