Faithful Autoformalization via Roundtrip Verification and Repair
For researchers and practitioners using LLMs for formalization, this provides a method to verify faithfulness without ground-truth annotations, addressing a key reliability bottleneck.
The paper proposes a roundtrip verification approach for autoformalization that checks logical equivalence between original and re-formalized statements, achieving 83-85% formal equivalence on 150 traffic rules, up from 45-61%.
When an LLM formalizes natural language, how do we know the output is faithful? We propose a roundtrip verification approach which does not require ground-truth annotations: formalize a statement, translate the result back to natural language, re-formalize, and use a formal tool to check logical equivalence. When the two formalizations agree, this provides evidence of a faithful formalization. When they disagree, a diagnosis step identifies which translation stage failed, and a targeted repair operator attempts to correct that stage. We evaluate our approach on 150 traffic rules using Claude Opus 4.6 and GPT-5.2. Diagnosis-guided repair raises formal equivalence from 45--61% to 83--85% for both models, outperforming a random-repair baseline. An independent NLI analysis confirms that formal equivalence is correlated with less semantic drift.