CLFeb 6, 2020

Citation Data of Czech Apex Courts

Jakub Harašta, Tereza Novotná, Jaromír Šavelka

arXiv:2002.02224v10.2h-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a domain-specific dataset for legal research in the Czech Republic, but it is incremental as it builds on existing methods for data extraction.

The authors tackled the problem of extracting citation data from Czech court decisions by developing an NLP pipeline to automatically identify court decision identifiers, which was then manually refined to produce a high-quality dataset for analysis.

In this paper, we introduce the citation data of the Czech apex courts (Supreme Court, Supreme Administrative Court and Constitutional Court). This dataset was automatically extracted from the corpus of texts of Czech court decisions - CzCDC 1.0. We obtained the citation data by building the natural language processing pipeline for extraction of the court decision identifiers. The pipeline included the (i) document segmentation model and the (ii) reference recognition model. Furthermore, the dataset was manually processed to achieve high-quality citation data as a base for subsequent qualitative and quantitative analyses. The dataset will be made available to the general public.

View on arXiv PDF Code

Similar