SG-LegalCite: A Principle-Augmented Benchmark for Legal Citation Retrieval in Singapore Law
For legal professionals and researchers in common-law systems, particularly Singapore, this work provides a more doctrinally relevant retrieval method by integrating explicit legal principles, addressing a known bottleneck in legal AI.
The paper introduces a new paradigm for legal citation retrieval that incorporates explicit legal principles into queries, addressing the limitation of existing benchmarks that rely on factual similarity alone. Experiments on a new dataset of 100,890 case-principle pairs from Singapore law show that this principle-augmented approach significantly improves retrieval accuracy over 11 baselines.
Legal citation in common-law systems depends not only on factual similarity, but also on the legal principle for which a precedent is invoked. However, existing benchmarks for legal citation retrieval use case facts, citation context, or full judgments as inputs, where the governing legal principle is often missing or only implicitly expressed and entangled with broader context. As a result, models may retrieve precedents that are factually similar yet doctrinally irrelevant. This limitation is particularly consequential in Singapore, where the legal system has evolved independently: only domestic precedents are binding, while foreign authorities serve merely as persuasive references. Thus, we propose a new retrieval paradigm that ranks cited cases based on queries integrating case facts and explicit legal principles, inspired by real-world legal reasoning workflows. To support this paradigm, we introduce SG-LegalCite, a dataset of 100,890 case-principle pairs extracted from 8,523 Singapore Supreme Court judgments spanning from 2000 to 2025. Experiments across 11 baselines demonstrate the effectiveness of our principle-augmented retrieval paradigm, showing that explicit legal principles provide strong discriminative signals for legal citation retrieval.