LGDec 26, 2018
Application-driven Privacy-preserving Data Publishing with Correlated AttributesAria Rezaei, Chaowei Xiao, Jie Gao et al.
Recent advances in computing have allowed for the possibility to collect large amounts of data on personal activities and private living spaces. To address the privacy concerns of users in this environment, we propose a novel framework called PR-GAN that offers privacy-preserving mechanism using generative adversarial networks. Given a target application, PR-GAN automatically modifies the data to hide sensitive attributes -- which may be hidden and can be inferred by machine learning algorithms -- while preserving the data utility in the target application. Unlike prior works, the public's possible knowledge of the correlation between the target application and sensitive attributes is built into our modeling. We formulate our problem as an optimization problem, show that an optimal solution exists and use generative adversarial networks (GAN) to create perturbations. We further show that our method provides privacy guarantees under the Pufferfish framework, an elegant generalization of the differential privacy that allows for the modeling of prior knowledge on data and correlations. Through experiments, we show that our method outperforms conventional methods in effectively hiding the sensitive attributes while guaranteeing high performance in the target application, for both property inference and training purposes. Finally, we demonstrate through further experiments that once our model learns a privacy-preserving task, such as hiding subjects' identity, on a group of individuals, it can perform the same task on a separate group with minimal performance drops.
SIJan 31, 2017
Ties That Bind - Characterizing Classes by Attributes and Social TiesAria Rezaei, Bryan Perozzi, Leman Akoglu
Given a set of attributed subgraphs known to be from different classes, how can we discover their differences? There are many cases where collections of subgraphs may be contrasted against each other. For example, they may be assigned ground truth labels (spam/not-spam), or it may be desired to directly compare the biological networks of different species or compound networks of different chemicals. In this work we introduce the problem of characterizing the differences between attributed subgraphs that belong to different classes. We define this characterization problem as one of partitioning the attributes into as many groups as the number of classes, while maximizing the total attributed quality score of all the given subgraphs. We show that our attribute-to-class assignment problem is NP-hard and an optimal $(1 - 1/e)$-approximation algorithm exists. We also propose two different faster heuristics that are linear-time in the number of attributes and subgraphs. Unlike previous work where only attributes were taken into account for characterization, here we exploit both attributes and social ties (i.e. graph structure). Through extensive experiments, we compare our proposed algorithms, show findings that agree with human intuition on datasets from Amazon co-purchases, Congressional bill sponsorships, and DBLP co-authorships. We also show that our approach of characterizing subgraphs is better suited for sense-making than discriminating classification approaches.