Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation
This addresses the need for task-agnostic data valuation methods for graphs, which are widely used in fields like chemistry and social networks, but the approach is incremental as it adapts existing concepts like graph matching and Wasserstein distance to a new application.
The paper tackles the problem of valuing graph data in marketplaces without task-specific metrics by decomposing graphs into structural and featural components, introducing a blind message passing framework that uses graph matching and Wasserstein distance to quantify structural disparities and featural relevance/diversity, with experiments on real datasets showing effectiveness in graph-based data valuation.
With the emergence of data marketplaces, the demand for methods to assess the value of data has increased significantly. While numerous techniques have been proposed for this purpose, none have specifically addressed graphs as the main data modality. Graphs are widely used across various fields, ranging from chemical molecules to social networks. In this study, we break down graphs into two main components: structural and featural, and we focus on evaluating data without relying on specific task-related metrics, making it applicable in practical scenarios where validation requirements may be lacking. We introduce a novel framework called blind message passing, which aligns the seller's and buyer's graphs using a shared node permutation based on graph matching. This allows us to utilize the graph Wasserstein distance to quantify the differences in the structural distribution of graph datasets, called the structural disparities. We then consider featural aspects of buyers' and sellers' graphs for data valuation and capture their statistical similarities and differences, referred to as relevance and diversity, respectively. Our approach ensures that buyers and sellers remain unaware of each other's datasets. Our experiments on real datasets demonstrate the effectiveness of our approach in capturing the relevance, diversity, and structural disparities of seller data for buyers, particularly in graph-based data valuation scenarios.