Amir Gilad

DB
h-index24
3papers
1citation
Novelty55%
AI Score45

3 Papers

23.2SIMay 27
Efficient Shapley-Based Influence Attribution in Social Networks

Fangzhu Shen, Amir Gilad, Sudeepa Roy

The ubiquity of social platforms has reshaped the way information, behaviors, and advertisements diffuse across networks, with influence propagation often initiated by a small set of ``seed'' users. While much of the literature emphasizes optimizing seed selection to maximize spread, a critical yet underexplored question remains: how to fairly estimate the contributions of individual seeds ``ex-ante'', i.e., before the diffusion process occurs? This capability is essential for budget allocation, influencer pricing, and fair, privacy-preserving credit distribution under uncertainty, without relying on ex-post cascade logs that capture only a single execution of influence propagation. We introduce a framework for ex-ante influence attribution based on Shapley values from cooperative game theory, which capture each seed's marginal impact in a principled and equitable manner. Adapting Shapley values to influence propagation raises unique computational challenges due to the stochastic nature of diffusion and the intricate dependencies across network structures. To address these challenges, we design polynomial-time algorithms for the special case of single-step activation that is of independent practical interest, establish a sharp tractability boundary by proving $\#P$-hardness for any propagation beyond one step, and develop approximation algorithms with provable guarantees for the standard IC model as well as time-bounded variants. Empirical evaluation on real-world and synthetic networks demonstrates that our methods are both efficient and effective, offering a practical mechanism for ex-ante influence attribution.

36.0DBMay 21
Measuring Database Unfairness via Dependency Quantification Under Differential Privacy

Mariia Vologdin, Yuchao Tao, Amir Gilad

Differential privacy (DP) has become the de facto standard for protecting sensitive data, providing strong guarantees that published statistics or models reveal limited information about any individual. However, privacy noise and restricted data access make it increasingly difficult to assess the fairness and reliability of private datasets. In this paper, we propose a formal framework for quantifying data unfairness under DP. We identify three core desiderata for unfairness measures based on previous work: positivity, monotonicity, and DP computability. We further instantiate them through three complementary measures: (1) a mutual information-based measure with a total variation distance proxy suitable for DP, (2) a data repair-based measure approximated via a reduction to weighted MaxSAT, and (3) a top-$k$ tuple contribution measure that isolates the most influential records in fairness violations. We design privacy-preserving algorithms and analyze their sensitivity, accuracy, and efficiency. Extensive experiments on multiple real-world datasets demonstrate that our proposed measures faithfully approximate their non-private counterparts, effectively quantify bias under privacy constraints, and provide insights for data management.

LGMay 29, 2025
Refining Labeling Functions with Limited Labeled Data

Chenjie Li, Amir Gilad, Boris Glavic et al.

Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends directly on the accuracy of the LFs. In this work, we study the problem of fixing LFs based on a small set of labeled examples. Towards this goal, we develop novel techniques for repairing a set of LFs by minimally changing their results on the labeled examples such that the fixed LFs ensure that (i) there is sufficient evidence for the correct label of each labeled datapoint and (ii) the accuracy of each repaired LF is sufficiently high. We model LFs as conditional rules which enables us to refine them, i.e., to selectively change their output for some inputs. We demonstrate experimentally that our system improves the quality of LFs based on surprisingly small sets of labeled datapoints.