CourtPressGER: A German Court Decision to Press Release Summarization Dataset
This addresses the need for citizen-oriented communication of judicial rulings in Germany, though it is incremental as it builds on existing summarization techniques with a new dataset.
The authors tackled the problem of summarizing German court decisions into press releases for public communication by introducing CourtPressGER, a 6.4k dataset of rulings, press releases, and prompts for LLMs. They benchmarked LLMs on this dataset, finding that large LLMs produce high-quality drafts with minimal performance loss, while smaller models require hierarchical setups, and human-drafted releases ranked highest in evaluations.
Official court press releases from Germany's highest courts present and explain judicial rulings to the public, as well as to expert audiences. Prior NLP efforts emphasize technical headnotes, ignoring citizen-oriented communication needs. We introduce CourtPressGER, a 6.4k dataset of triples: rulings, human-drafted press releases, and synthetic prompts for LLMs to generate comparable releases. This benchmark trains and evaluates LLMs in generating accurate, readable summaries from long judicial texts. We benchmark small and large LLMs using reference-based metrics, factual-consistency checks, LLM-as-judge, and expert ranking. Large LLMs produce high-quality drafts with minimal hierarchical performance loss; smaller models require hierarchical setups for long judgments. Initial benchmarks show varying model performance, with human-drafted releases ranking highest.