CLJul 13, 2025

Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?

arXiv:2507.09638v13 citationsh-index: 20
Originality Incremental advance
AI Analysis

This provides an effective and resource-efficient solution for enhancing Thai legal LLMs, addressing a domain-specific bottleneck in legal reasoning and question answering.

The paper tackled the problem of limited performance of Retrieval-Augmented Generation (RAG) systems on Thai legal question answering, especially for complex reasoning, by introducing Group-Relative Policy Optimization (GRPO) with BGE-M3 embeddings. The result was up to 90% citation-F1 gains and a 31% increase in joint quality metrics over instruction tuning, with 2.5x computational cost reduction.

The Retrieval-Augmented Generation (RAG) systems' performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce an approach aligning LLMs toward improved law citation accuracy and better response quality using Group-Relative Policy Optimization (GRPO). Our approach leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to large language model judges. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains from the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our method shows enhanced robustness on complex legal reasoning tasks compared to instruction tuning, providing an effective and resource-efficient solution for enhancing Thai legal LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes