Mark P. J. van der Loo

10.4SIApr 13

The anonymization problem in social networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

This paper introduces a unified computational framework for the anonymization problem in social networks, where the objective is to maximize node anonymity through graph alterations. We define three variants of the underlying optimization problem: full, partial and budgeted anonymization. In each variant, the objective is to maximize the number of $k$-anonymous nodes, i.e., nodes for which at least $k-1$ other nodes are equivalent under a particular anonymity measure. We propose four new heuristic network anonymization algorithms and implement these in ANO-NET, a reusable computational framework. Experiments on three common graph models and 19 real-world network datasets yield three empirical findings. First, regarding the method of alteration, experiments on graph models show that random edge deletion is more effective than edge rewiring and addition. Second, we show that the choice of anonymity measure strongly affects both initial network anonymity and the difficulty of anonymization. This highlights the importance of careful measure selection, matching a realistic attacker scenario. Third, comparing the four proposed algorithms and an edge sampling baseline from the literature, we find that an approach which preferentially deletes edges affecting structurally unique nodes, consistently outperforms heuristics based solely on network structure. Overall, our best performing algorithm retains on average 14 times more edges in full anonymization. Moreover, it yields 4.8 times more anonymous nodes than the baseline in the budgeted variant. On top of that, the best performing algorithm achieves a better trade-off between anonymity and data utility. This work provides a foundation for the future development of effective network anonymization algorithms.

19.9SIMay 12

Fuzzy k-anonymity in complex networks

Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

With the introduction of large-scale network data, including population-scale social networks, techniques for privacy-aware sharing of network data become increasingly important. While existing $k$-anonymity approaches can model different attacker scenarios, they typically assume that attacker knowledge exactly matches the published network structure. We argue that exact knowledge is often unrealistic and introduce $ϕ$-$k$-anonymity, a fuzzy variant of $k$-anonymity in which parameter $ϕ$ captures the level of uncertainty in attacker knowledge. Across a benchmark of $39$ real-world networks, a realistic level of uncertainty ($ϕ=5\%$) renders, on average, $64\%$ of previously unique nodes anonymous. To further enhance anonymity, we apply anonymization algorithms under a $5\%$ edge modification budget. While full anonymization is often unattainable under exact $k$-anonymity, with low uncertainty ($ϕ=10\%$) our newly proposed Greedy algorithm anonymizes over $99\%$ of the nodes. Uncertainty also enables effective anonymization in otherwise difficult to anonymize dense synthetic graphs. Additionally, data utility in terms of structural properties and performance on network analysis tasks is well preserved, with most metrics changing less than $5\%$. Overall, our findings suggest that modest uncertainty assumptions yield high levels of anonymity and utility, motivating further research on uncertainty-aware privacy guarantees for network data.

Mark P. J. van der Loo

2 Papers