KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment
This addresses the problem of autoformalization for researchers and practitioners in formal verification by providing a scalable, verified method, though it is incremental as it builds on existing LLM-based approaches.
The paper tackles the bottleneck of limited multilingual parallel corpora for autoformalization by proposing KELPS, a neuro-symbolic framework that translates informal mathematics into formal languages, resulting in a corpus of over 60,000 problems and achieving 88.9% syntactic accuracy on MiniF2F, outperforming SOTA models.
Modern large language models (LLMs) show promising progress in formalizing informal mathematics into machine-verifiable theorems. However, these methods still face bottlenecks due to the limited quantity and quality of multilingual parallel corpora. In this paper, we propose a novel neuro-symbolic framework KELPS (Knowledge-Equation based Logical Processing System) to address these problems. KELPS is an iterative framework for translating, synthesizing, and filtering informal data into multiple formal languages (Lean, Coq, and Isabelle). First, we translate natural language into Knowledge Equations (KEs), a novel language that we designed, theoretically grounded in assertional logic. Next, we convert them to target languages through rigorously defined rules that preserve both syntactic structure and semantic meaning. This process yielded a parallel corpus of over 60,000 problems. Our framework achieves 88.9% syntactic accuracy (pass@1) on MiniF2F, outperforming SOTA models such as Deepseek-V3 (81%) and Herald (81.3%) across multiple datasets. All datasets and codes are available in the supplementary materials.