IRJun 2

Section-Weighted Hybrid Approach for Legal Case Retrieval

arXiv:2606.0313836.5

AI Analysis

For legal professionals needing to find analogous precedents, this work improves retrieval accuracy by incorporating legal reasoning structure, though the gains are incremental over existing hybrid approaches.

The paper tackles legal case retrieval by proposing a two-stage section-aware framework that segments judgments into sections using an LLM, then combines lexical and semantic search with normalized section-weighted scoring. It achieves consistent gains over baselines on a jurisdiction-scale benchmark.

Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1, we combine parallel lexical (BM25) and semantic (dense ANN) whole-document searches via Reciprocal Rank Fusion (RRF) to form a high-recall candidate pool. In Stage 2, we perform fine-grained, like-for-like comparisons (e.g., query reasoning vs. candidate reasoning). To address the scale mismatch between unbounded lexical scores and cosine similarities, we apply query-wise Z-score normalization before aggregating signals with learned section weights. For the top results, the system returns the relevant section text with a concise, grounded rationale and party-stance labels. We evaluate on a jurisdiction-scale benchmark, demonstrating consistent gains over strong lexical and neural baselines while maintaining high candidate coverage

View on arXiv PDF

Similar