Data Measurements for Decentralized Data Markets
This work addresses the need for efficient and equitable data acquisition in machine learning, though it appears incremental as it builds on existing market concepts with new measurement techniques.
The paper tackles the problem of seller selection in decentralized data markets by proposing federated data measurements for evaluating dataset relevance and diversity, enabling buyers to compare sellers directly without brokers or task-specific models.
Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relative comparisons between sellers without requiring intermediate brokers and training task-dependent models.