CLAILGOct 17, 2024

Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland

arXiv:2410.13456v36 citationsh-index: 12Has CodeEMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the problem of costly legal research for lawyers in Switzerland by providing a dataset and benchmarks, though it is incremental as it builds on existing summarization methods.

The authors tackled the lack of headnotes in Swiss court decisions by creating the SLDS dataset of 20K rulings with multilingual summaries, and found that fine-tuned models achieve good lexical similarity while larger models like GPT-4o produce more legally accurate summaries.

Legal research depends on headnotes: concise summaries that help lawyers quickly identify relevant cases. Yet, many court decisions lack them due to the high cost of manual annotation. To address this gap, we introduce the Swiss Landmark Decisions Summarization (SLDS) dataset containing 20K rulings from the Swiss Federal Supreme Court, each with headnotes in German, French, and Italian. SLDS has the potential to significantly improve access to legal information and transform legal research in Switzerland. We fine-tune open models (Qwen2.5, Llama 3.2, Phi-3.5) and compare them to larger general-purpose and reasoning-tuned LLMs, including GPT-4o, Claude 3.5 Sonnet, and the open-source DeepSeek R1. Using an LLM-as-a-Judge framework, we find that fine-tuned models perform well in terms of lexical similarity, while larger models generate more legally accurate and coherent summaries. Interestingly, reasoning-focused models show no consistent benefit, suggesting that factual precision is more important than deep reasoning in this task. We release SLDS under a CC BY 4.0 license to support future research in cross-lingual legal summarization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes