LGMar 20, 2024

DAVED: Data Acquisition via Experimental Design for Data Markets

arXiv:2403.13893v28 citationsh-index: 28NIPS
AI Analysis

This addresses the challenge for data buyers in decentralized markets, particularly in data-scarce domains like healthcare, by enabling efficient data selection to improve machine learning models.

The paper tackles the problem of selecting valuable data points from sellers in data markets, proposing a federated data acquisition method inspired by linear experimental design that achieves lower prediction error without requiring labeled validation data.

The acquisition of training data is crucial for machine learning applications. Data markets can increase the supply of data, particularly in data-scarce domains such as healthcare, by incentivizing potential data providers to join the market. A major challenge for a data buyer in such a market is choosing the most valuable data points from a data seller. Unlike prior work in data valuation, which assumes centralized data access, we propose a federated approach to the data acquisition problem that is inspired by linear experimental design. Our proposed data acquisition method achieves lower prediction error without requiring labeled validation data and can be optimized in a fast and federated procedure. The key insight of our work is that a method that directly estimates the benefit of acquiring data for test set prediction is particularly compatible with a decentralized market setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes