Guangyuan Wu

55.6SEMay 23

Synthesizing Inductive Invariants for Distributed Protocols via IC3 and Large Language Models

Weining Cao, Guangyuan Wu, Yuan Yao et al.

Distributed protocols are notoriously difficult to verify correctly. Proving safety typically requires inductive invariants that both imply the desired property and are preserved by every protocol transition; yet inferring such invariants remains a major bottleneck: existing approaches either restrict the protocol models to a decidable fragment of first-order logic or demand expert-crafted templates. We present IC3Syn, a neuro-symbolic framework that synthesizes inductive invariants by executing an IC3-style process over TLA+ states with the assistance of Large Language Models (LLMs). At large, IC3Syn combines a symbolic IC3 controller, which decomposes invariant synthesis into focused blocking tasks and an LLM which provides protocol-level reasoning that IC3 alone lacks for TLA+ specifications. This integration enables a disciplined yet flexible search for invariants without imposing logical restrictions or requiring manual templates. We evaluate IC3Syn on 29 distributed protocols spanning consensus, reconfiguration and client-server systems, and compare it against Endive, IC3PO, SWISS and DistAI. IC3Syn discovers candidate invariants for all 29 protocols, including MongoLoglessDynamicRaft (MLDR), an industrial-scale Raft-based reconfiguration protocol for which none of the compared tools reports a solution, as well as one complex Paxos variant. In each case, the invariants synthesized on finite instances are shown in TLAPS to be inductive for the full unbounded protocol, thereby establishing safety.

81.9SEMay 12

Uncertainty Quantification for LLM-based Code Generation

Senrong Xu, Yuhao Tan, Yanke Zhou et al.

Prediction sets provide a theoretically grounded framework for quantifying uncertainty in machine learning models. Adapting them to structured generation tasks, in particular, large language model (LLM) based code generation, remains a challenging problem. An existing attempt proposes PAC prediction sets but is limited by its strong monotonicity assumption on risk and single-label classification framework, which severely limits the space of candidate programs and cannot accommodate the multiple valid outputs inherent to code generation. To address these limitations, we propose an approach RisCoSet that leverages multiple hypothesis testing to construct risk-controlling predictions for LLM-based code generation. Given a trained code generation model, we produce a prediction set represented by a partial program, which is guaranteed to contain a correct solution with high confidence. Extensive experiments on three LLMs demonstrate the effectiveness of the proposed method. For instance, compared with the state-of-the-art, our method can significantly reduce the code removal by up to 24.5%, at the same level of risk.

Guangyuan Wu

2 Papers