Mara Downing

35.6SEMay 7

Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation

Laboni Sarker, Mara Downing, Achintya Desai et al.

Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces code that implements the requirements specified in the prompt. Many software requirements include mathematical formulas that specify the expected behavior of the code to be generated. Given a code generation prompt that contains a mathematical formula, a reasonable expectation is that, if the formula is syntactically modified without changing its semantics, the generated code for the modified prompt should be semantically equivalent. We formalize this concept as syntactic robustness and investigate the syntactic robustness of LLMs as code generators. Our experimental assessment demonstrates that LLMs are not syntactically robust for code generation prompts with formulas, especially for the ones that require mathematical reasoning. We investigate attack strategies that can further deteriorate the syntactic robustness of LLMs. Finally, to mitigate syntactic robustness failures in LLMs, we propose a pre-processing step that uses reductions to transform formulas in prompts to a simplified form. Our experimental results demonstrate that the syntactic robustness of LLM-based code generation improves significantly using our approach, improving syntactic robustness of LLMs from 54.05% to 74.42%.

SEDec 5, 2025

Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation

Mara Downing, Matthew Peng, Jacob Granley et al.

Objective: Machine learning (ML) models are increasingly used to generate electrical stimulation patterns in neuroprosthetic devices such as visual prostheses. While these models promise precise and personalized control, they also introduce new safety risks when model outputs are delivered directly to neural tissue. We propose a systematic, quantitative approach to detect and characterize unsafe stimulation patterns in ML-driven neurostimulation systems. Approach: We adapt an automated software testing technique known as coverage-guided fuzzing to the domain of neural stimulation. Here, fuzzing performs stress testing by perturbing model inputs and tracking whether resulting stimulation violates biophysical limits on charge density, instantaneous current, or electrode co-activation. The framework treats encoders as black boxes and steers exploration with coverage metrics that quantify how broadly test cases span the space of possible outputs and violation types. Main results: Applied to deep stimulus encoders for the retina and cortex, the method systematically reveals diverse stimulation regimes that exceed established safety limits. Two violation-output coverage metrics identify the highest number and diversity of unsafe outputs, enabling interpretable comparisons across architectures and training strategies. Significance: Violation-focused fuzzing reframes safety assessment as an empirical, reproducible process. By transforming safety from a training heuristic into a measurable property of the deployed model, it establishes a foundation for evidence-based benchmarking, regulatory readiness, and ethical assurance in next-generation neural interfaces.

Mara Downing

2 Papers