CLAISep 26, 2025

Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs

arXiv:2509.22338v13 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses a crucial challenge in knowledge representation and formal methods, though it is incremental in improving existing methods.

The paper tackled the problem of automating natural language to first-order logic translation by evaluating fine-tuned LLMs, achieving 70% accuracy with predicate lists and outperforming models like GPT-4o and symbolic systems.

Automating the translation of natural language to first-order logic (FOL) is crucial for knowledge representation and formal methods, yet remains challenging. We present a systematic evaluation of fine-tuned LLMs for this task, comparing architectures (encoder-decoder vs. decoder-only) and training strategies. Using the MALLS and Willow datasets, we explore techniques like vocabulary extension, predicate conditioning, and multilingual training, introducing metrics for exact match, logical equivalence, and predicate alignment. Our fine-tuned Flan-T5-XXL achieves 70% accuracy with predicate lists, outperforming GPT-4o and even the DeepSeek-R1-0528 model with CoT reasoning ability as well as symbolic systems like ccg2lambda. Key findings show: (1) predicate availability boosts performance by 15-20%, (2) T5 models surpass larger decoder-only LLMs, and (3) models generalize to unseen logical arguments (FOLIO dataset) without specific training. While structural logic translation proves robust, predicate extraction emerges as the main bottleneck.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes