LGAICYITMLAug 25, 2022

Fundamentals of Task-Agnostic Data Valuation

arXiv:2208.12354v120 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the challenge for data buyers and sellers in assessing data value in task-agnostic scenarios, though it is incremental as it builds on existing statistical estimation techniques.

The paper tackles the problem of valuing data without requiring a specific task or validation set, by estimating statistical differences between seller and buyer data through second moments to measure diversity and relevance. The proposed method is validated on real tabular and image datasets, showing it effectively captures these measures without raw data exchange.

We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes