CRLGAug 18, 2023

Attesting Distributional Properties of Training Data for Machine Learning

arXiv:2308.09552v410 citationsh-index: 51
Originality Incremental advance
AI Analysis

This addresses trustworthiness concerns in ML for regulators and stakeholders, though it is incremental as it builds on existing property inference and cryptographic techniques.

The paper tackles the problem of verifying that training data meets distributional requirements, such as diversity, without revealing the data, by proposing a hybrid property attestation method combining property inference and cryptography.

The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes