LGAINov 3, 2021

Building Legal Datasets

arXiv:2111.02034v14 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge for dataset builders in navigating complex legal requirements to create compliant datasets, which is an incremental contribution.

The paper tackles the problem of ensuring machine learning datasets comply with proliferating data protection laws, reviewing legal obligations and offering a framework for building legal datasets.

Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of ``better''. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes