CLAIDec 5, 2025

Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods

arXiv:2512.05681v1
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient case law retrieval for legal professionals, though it is incremental as it focuses on evaluation methods and model comparison rather than introducing a new paradigm.

The paper tackled the problem of retrieving semantically similar legal decisions under noisy institutional labels by comparing a general-purpose OpenAI embedder and a domain-specific BERT model on Czech Constitutional Court data, finding that the OpenAI embedder decisively outperformed BERT with statistically significant differences in nDCG scores across thresholds.

Retrieving case law is a time-consuming task predominantly carried out by querying databases. We provide a comparison of two models in three different settings for Czech Constitutional Court decisions: (i) a large general-purpose embedder (OpenAI), (ii) a domain-specific BERT-trained from scratch on ~30,000 decisions using sliding windows and attention pooling. We propose a noise-aware evaluation including IDF-weighted keyword overlap as graded relevance, binarization via two thresholds (0.20 balanced, 0.28 strict), significance via paired bootstrap, and an nDCG diagnosis supported with qualitative analysis. Despite modest absolute nDCG (expected under noisy labels), the general OpenAI embedder decisively outperforms the domain pre-trained BERT in both settings at @10/@20/@100 across both thresholds; differences are statistically significant. Diagnostics attribute low absolutes to label drift and strong ideals rather than lack of utility. Additionally, our framework is robust enough to be used for evaluation under a noisy gold dataset, which is typical when handling data with heterogeneous labels stemming from legacy judicial databases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes