LG MLOct 7, 2018

European Court of Human Right Open Data project

arXiv:1810.03115v20.85 citations

Originality Synthesis-oriented

AI Analysis

This provides a reproducible, open-source resource for researchers, data scientists, citizens, and legal practitioners, addressing data governance issues in legal AI, though it is incremental as it applies existing methods to new data.

The paper introduces thirteen datasets from European Court of Human Rights judgments for classification tasks, achieving accuracies between 75.86% and 98.32% with an average of 96.45% in binary classification experiments.

This paper presents thirteen datasets for binary, multiclass and multilabel classification based on the European Court of Human Rights judgments since its creation. The interest of such datasets is explained through the prism of the researcher, the data scientist, the citizen and the legal practitioner. Contrarily to many datasets, the creation process, from the collection of raw data to the feature transformation, is provided under the form of a collection of fully automated and open-source scripts. It ensures reproducibility and a high level of confidence in the processed data, which is some of the most important issues in data governance nowadays. A first experimental campaign is performed to study some predictability properties and to establish baseline results on popular machine learning algorithms. The results are consistently good across the binary datasets with an accuracy comprised between 75.86% and 98.32% for an average accuracy of 96.45%.

View on arXiv PDF

Similar