CL AISep 1, 2025

Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation

Abdessalam Bouchekif, Samer Rashwani, Heba Sbahi, Shahd Gaben, Mutaz Al-Khatib, Mohammed Ghaly

arXiv:2509.01081v219 citationsh-index: 8Has CodeProceedings of The Third Arabic Natural Language Processing Conference

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of assessing AI models' legal reasoning capabilities for Islamic law practitioners and researchers, but it is incremental as it applies existing evaluation methods to a new domain.

The paper evaluated seven large language models on their ability to reason about Islamic inheritance law using a benchmark of 1,000 multiple-choice questions, finding that o3 and Gemini 2.5 achieved accuracies above 90%, while others scored below 50%.

This paper evaluates the knowledge and reasoning capabilities of Large Language Models in Islamic inheritance law, known as 'ilm al-mawarith. We assess the performance of seven LLMs using a benchmark of 1,000 multiple-choice questions covering diverse inheritance scenarios, designed to test models' ability to understand the inheritance context and compute the distribution of shares prescribed by Islamic jurisprudence. The results reveal a significant performance gap: o3 and Gemini 2.5 achieved accuracies above 90%, whereas ALLaM, Fanar, LLaMA, and Mistral scored below 50%. These disparities reflect important differences in reasoning ability and domain adaptation. We conduct a detailed error analysis to identify recurring failure patterns across models, including misunderstandings of inheritance scenarios, incorrect application of legal rules, and insufficient domain knowledge. Our findings highlight limitations in handling structured legal reasoning and suggest directions for improving performance in Islamic legal reasoning. Code: https://github.com/bouchekif/inheritance_evaluation

View on arXiv PDF Code

Similar