Nafi Diallo

CL
h-index31
4papers
22citations
Novelty40%
AI Score39

4 Papers

CLNov 12, 2025
A Neurosymbolic Approach to Natural Language Formalization and Verification

Sam Bayless, Stefano Buliani, Darion Cassel et al.

Large Language Models perform well at natural language interpretation and reasoning, but their inherent stochasticity limits their adoption in regulated industries like finance and healthcare that operate under strict policies. To address this limitation, we present a two-stage neurosymbolic framework that (1) uses LLMs with optional human guidance to formalize natural language policies, allowing fine-grained control of the formalization process, and (2) uses inference-time autoformalization to validate logical correctness of natural language statements against those policies. When correctness is paramount, we perform multiple redundant formalization steps at inference time, cross checking the formalizations for semantic equivalence. Our benchmarks demonstrate that our approach exceeds 99% soundness, indicating a near-zero false positive rate in identifying logical validity. Our approach produces auditable logical artifacts that substantiate the verification outcomes and can be used to improve the original text.

16.0CLApr 20
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

Adewale Akinfaderin, Nafi Diallo

We present FregeLogic, a hybrid neuro-symbolic system for SemEval-2026 Task 11 (Subtask 1), which addresses syllogistic validity prediction while reducing content effects on predictions. Our approach combines an ensemble of five LLM classifiers, spanning three open-weights models (Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B) paired with varied prompting strategies, with a Z3 SMT solver that serves as a formal logic tiebreaker. The central hypothesis is that LLM disagreement within the ensemble signals likely content-biased errors, where real-world believability interferes with logical judgment. By deferring to Z3's structurally-grounded formal verification on these disputed cases, our system achieves 94.3% accuracy with a content effect of 2.85 and a combined score of 41.88 in nested 5-fold cross-validation on the dataset (N=960). This represents a 2.76-point improvement in combined score over the pure ensemble (39.12), with a 0.9% accuracy gain, driven by a 16% reduction in content effect (3.39 to 2.85). Adopting structured-output API calls for Z3 extraction reduced failure rates from ~22% to near zero, and an Aristotelian encoding with existence axioms was validated against task annotations. Our results suggest that targeted neuro-symbolic integration, applying formal methods precisely where ensemble consensus is lowest, can improve the combined accuracy-plus-content-effect metric used by this task.

LOJun 7, 2016
Program Derivation by Correctness Enhacements

Nafi Diallo, Wided Ghardallou, Jules Desharnais et al.

Relative correctness is the property of a program to be more-correct than another program with respect to a given specification. Among the many properties of relative correctness, that which we found most intriguing is the property that program P' refines program P if and only if P' is more-correct than P with respect to any specification. This inspires us to reconsider program derivation by successive refinements: each step of this process mandates that we transform a program P into a program P' that refines P, i.e. P' is more-correct than P with respect to any specification. This raises the question: why should we want to make P' more-correct than P with respect to any specification, when we only have to satisfy specification R? In this paper, we discuss a process of program derivation that replaces traditional sequence of refinement-based correctness-preserving transformations starting from specification R by a sequence of relative correctness-based correctness-enhancing transformations starting from abort.

SEJun 1, 2016
Program Repair by Stepwise Correctness Enhancement

Nafi Diallo, Wided Ghardallou, Ali Mili

Relative correctness is the property of a program to be more-correct than another with respect to a given specification. Whereas the traditional definition of (absolute) correctness divides candidate program into two classes (correct, and incorrect), relative correctness arranges candidate programs on the richer structure of a partial ordering. In other venues we discuss the impact of relative correctness on program derivation, and on program verification. In this paper, we discuss the impact of relative correctness on program testing; specifically, we argue that when we remove a fault from a program, we ought to test the new program for relative correctness over the old program, rather than for absolute correctness. We present analytical arguments to support our position, as well as an empirical argument in the form of a small program whose faults are removed in a stepwise manner as its relative correctness rises with each fault removal until we obtain a correct program.