CLAISep 1, 2025

Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation

arXiv:2509.01081v219 citationsh-index: 8Has CodeProceedings of The Third Arabic Natural Language Processing Conference
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing AI models' legal reasoning capabilities for Islamic law practitioners and researchers, but it is incremental as it applies existing evaluation methods to a new domain.

The paper evaluated seven large language models on their ability to reason about Islamic inheritance law using a benchmark of 1,000 multiple-choice questions, finding that o3 and Gemini 2.5 achieved accuracies above 90%, while others scored below 50%.

This paper evaluates the knowledge and reasoning capabilities of Large Language Models in Islamic inheritance law, known as 'ilm al-mawarith. We assess the performance of seven LLMs using a benchmark of 1,000 multiple-choice questions covering diverse inheritance scenarios, designed to test models' ability to understand the inheritance context and compute the distribution of shares prescribed by Islamic jurisprudence. The results reveal a significant performance gap: o3 and Gemini 2.5 achieved accuracies above 90%, whereas ALLaM, Fanar, LLaMA, and Mistral scored below 50%. These disparities reflect important differences in reasoning ability and domain adaptation. We conduct a detailed error analysis to identify recurring failure patterns across models, including misunderstandings of inheritance scenarios, incorrect application of legal rules, and insufficient domain knowledge. Our findings highlight limitations in handling structured legal reasoning and suggest directions for improving performance in Islamic legal reasoning. Code: https://github.com/bouchekif/inheritance_evaluation

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes