LGJan 31, 2022
Differentially Private Top-k Selection via Canonical Lipschitz MechanismMichael Shekelyan, Grigorios Loukides
Selecting the top-$k$ highest scoring items under differential privacy (DP) is a fundamental task with many applications. This work presents three new results. First, the exponential mechanism, permute-and-flip and report-noisy-max, as well as their oneshot variants, are unified into the Lipschitz mechanism, an additive noise mechanism with a single DP-proof via a mandated Lipschitz property for the noise distribution. Second, this new generalized mechanism is paired with a canonical loss function to obtain the canonical Lipschitz mechanism, which can directly select k-subsets out of $d$ items in $O(dk+d \log d)$ time. The canonical loss function assesses subsets by how many users must change for the subset to become top-$k$. Third, this composition-free approach to subset selection improves utility guarantees by an $Ω(\log k)$ factor compared to one-by-one selection via sequential composition, and our experiments on synthetic and real-world data indicate substantial utility improvements.
DSJan 18, 2021
Maximizing approximately k-submodular functionsLeqian Zheng, Hau Chan, Grigorios Loukides et al.
We introduce the problem of maximizing approximately $k$-submodular functions subject to size constraints. In this problem, one seeks to select $k$-disjoint subsets of a ground set with bounded total size or individual sizes, and maximum utility, given by a function that is "close" to being $k$-submodular. The problem finds applications in tasks such as sensor placement, where one wishes to install $k$ types of sensors whose measurements are noisy, and influence maximization, where one seeks to advertise $k$ topics to users of a social network whose level of influence is uncertain. To deal with the problem, we first provide two natural definitions for approximately $k$-submodular functions and establish a hierarchical relationship between them. Next, we show that simple greedy algorithms offer approximation guarantees for different types of size constraints. Last, we demonstrate experimentally that the greedy algorithms are effective in sensor placement and influence maximization problems.
CRNov 29, 2019
Location histogram privacy by sensitive location hiding and target histogram avoidance/resemblance (extended version)Grigorios Loukides, George Theodorakopoulos
A location histogram is comprised of the number of times a user has visited locations as they move in an area of interest, and it is often obtained from the user in applications such as recommendation and advertising. However, a location histogram that leaves a user's computer or device may threaten privacy when it contains visits to locations that the user does not want to disclose (sensitive locations), or when it can be used to profile the user in a way that leads to price discrimination and unsolicited advertising. Our work introduces two privacy notions to protect a location histogram from these threats: sensitive location hiding, which aims at concealing all visits to sensitive locations, and target avoidance/resemblance, which aims at concealing the similarity/dissimilarity of the user's histogram to a target histogram that corresponds to an undesired/desired profile. We formulate an optimization problem around each notion: Sensitive Location Hiding (SLH), which seeks to construct a histogram that is as similar as possible to the user's histogram but associates all visits with nonsensitive locations, and Target Avoidance/Resemblance (TA/TR), which seeks to construct a histogram that is as dissimilar/similar as possible to a given target histogram but remains useful for getting a good response from the application that analyzes the histogram. We develop an optimal algorithm for each notion and also develop a greedy heuristic for the TA/TR problem. Our experiments demonstrate that all algorithms are effective at preserving the distribution of locations in a histogram and the quality of location recommendation. They also demonstrate that the heuristic produces near-optimal solutions while being orders of magnitude faster than the optimal algorithm for TA/TR.