Milan Lopuhaä-Zwakenberg

h-index30

11papers

257citations

Novelty55%

AI Score43

Ranked #79,441 of 205,806 authors (top 39%)#1,900 in CR (top 26%)

11 Papers

LGJul 12, 2024

Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization

Wenrui Yu, Qiongxiu Li, Milan Lopuhaä-Zwakenberg et al.

Federated learning (FL) emerged as a paradigm designed to improve data privacy by enabling data to reside at its source, thus embedding privacy as a core consideration in FL architectures, whether centralized or decentralized. Contrasting with recent findings by Pasquini et al., which suggest that decentralized FL does not empirically offer any additional privacy or security benefits over centralized models, our study provides compelling evidence to the contrary. We demonstrate that decentralized FL, when deploying distributed optimization, provides enhanced privacy protection - both theoretically and empirically - compared to centralized approaches. The challenge of quantifying privacy loss through iterative processes has traditionally constrained the theoretical exploration of FL protocols. We overcome this by conducting a pioneering in-depth information-theoretical privacy analysis for both frameworks. Our analysis, considering both eavesdropping and passive adversary models, successfully establishes bounds on privacy leakage. We show information theoretically that the privacy loss in decentralized FL is upper bounded by the loss in centralized FL. Compared to the centralized case where local gradients of individual participants are directly revealed, a key distinction of optimization-based decentralized FL is that the relevant information includes differences of local gradients over successive iterations and the aggregated sum of different nodes' gradients over the network. This information complicates the adversary's attempt to infer private data. To bridge our theoretical insights with practical applications, we present detailed case studies involving logistic regression and deep neural networks. These examples demonstrate that while privacy leakage remains comparable in simpler models, complex models like deep neural networks exhibit lower privacy risks under decentralized FL.

23.7DSApr 28

Fixed-parameter tractable inference for discrete probabilistic programs, via string diagram algebraisation

Benedikt Peterseim, Milan Lopuhaä-Zwakenberg

Discrete probabilistic programs (DPPs) provide a highly expressive formalism for compactly defining arbitrary finite probabilistic models. This expressivity comes at a price: DPP inference is PSPACE-hard. In this work, we show that DPP inference only takes polynomial time for programs that are 'structurally simple'. More precisely, inference can be performed in polynomial time when the primal graph of each function appearing in the probabilistic program has bounded treewidth, and the inverse acceptance probability is at most exponential in the size of the probabilistic program. Existing algorithms do not achieve this performance guarantee. Our method relies on finding suitable decompositions, algebraisations, of the string diagrams underlying DPPs, employing existing algorithms for tree decompositions. This is independent of the probabilistic setting of DPPs and has direct applications to many problems, such as evaluating queries on relational databases and cybersecurity risk assessment via attack trees.

AIMar 13, 2024

Fuzzy Fault Trees Formalized

Thi Kim Nhung Dang, Milan Lopuhaä-Zwakenberg, Mariëlle Stoelinga

Fault tree analysis is a vital method of assessing safety risks. It helps to identify potential causes of accidents, assess their likelihood and severity, and suggest preventive measures. Quantitative analysis of fault trees is often done via the dependability metrics that compute the system's failure behaviour over time. However, the lack of precise data is a major obstacle to quantitative analysis, and so to reliability analysis. Fuzzy logic is a popular framework for dealing with ambiguous values and has applications in many domains. A number of fuzzy approaches have been proposed to fault tree analysis, but -- to the best of our knowledge -- none of them provide rigorous definitions or algorithms for computing fuzzy unreliability values. In this paper, we define a rigorous framework for fuzzy unreliability values. In addition, we provide a bottom-up algorithm to efficiently calculate fuzzy reliability for a system. The algorithm incorporates the concept of $α$-cuts method. That is, performing binary algebraic operations on intervals on horizontally discretised $α$-cut representations of fuzzy numbers. The method preserves the nonlinearity of fuzzy unreliability. Finally, we illustrate the results obtained from two case studies.

CRNov 9, 2021

Attack time analysis in dynamic attack trees via integer linear programming

Milan Lopuhaä-Zwakenberg, Mariëlle Stoelinga

Attack trees (ATs) are an important tool in security analysis, and an important part of AT analysis is computing metrics. However, metric computation is NP-complete in general. In this paper, we showcase the use of mixed integer linear programming (MILP) as a tool for quantitative analysis. Specifically, we use MILP to solve the open problem of calculating the min time metric of dynamic ATs, i.e., the minimal time to attack a system. We also present two other tools to further improve our MILP method: First, we show how the computation can be sped up by identifying the modules of an AT, i.e. subtrees connected to the rest of the AT via only one node. Second, we define a general semantics for dynamic ATs that significantly relaxes the restrictions on attack trees compared to earlier work, allowing us to apply our methods to a wide variety of ATs. Experiments on a synthetic testing set of large ATs verify that both the integer linear programming approach and modular analysis considerably decrease the computation time of attack time analysis.

CRJan 22, 2021

The Privacy-Utility Tradeoff of Robust Local Differential Privacy

Milan Lopuhaä-Zwakenberg, Jasper Goseling

We consider data release protocols for data $X=(S,U)$, where $S$ is sensitive; the released data $Y$ contains as much information about $X$ as possible, measured as $\operatorname{I}(X;Y)$, without leaking too much about $S$. We introduce the Robust Local Differential Privacy (RLDP) framework to measure privacy. This framework relies on the underlying distribution of the data, which needs to be estimated from available data. Robust privacy guarantees are ensuring privacy for all distributions in a given set $\mathcal{F}$, for which we study two cases: when $\mathcal{F}$ is the set of all distributions, and when $\mathcal{F}$ is a confidence set arising from a $χ^2$ test on a publicly available dataset. In the former case we introduce a new release protocol which we prove to be optimal in the low privacy regime. In the latter case we present four algorithms that construct RLDP protocols from a given dataset. One of these approximates $\mathcal{F}$ by a polytope and uses results from robust optimisation to yield high utility release protocols. However, this algorithm relies on vertex enumeration and becomes computationally inaccessible for large input spaces. The other three algorithms are low-complexity and build on randomised response. Experiments verify that all four algorithms offer significantly improved utility over regular LDP.

CRAug 30, 2020

Data Sanitisation Protocols for the Privacy Funnel with Differential Privacy Guarantees

Milan Lopuhaä-Zwakenberg, Haochen Tong, Boris Škorić

In the Open Data approach, governments and other public organisations want to share their datasets with the public, for accountability and to support participation. Data must be opened in such a way that individual privacy is safeguarded. The Privacy Funnel is a mathematical approach that produces a sanitised database that does not leak private data beyond a chosen threshold. The downsides to this approach are that it does not give worst-case privacy guarantees, and that finding optimal sanitisation protocols can be computationally prohibitive. We tackle these problems by using differential privacy metrics, and by considering local protocols which operate on one entry at a time. We show that under both the Local Differential Privacy and Local Information Privacy leakage metrics, one can efficiently obtain optimal protocols. Furthermore, Local Information Privacy is both more closely aligned to the privacy requirements of the Privacy Funnel scenario, and more efficiently computable. We also consider the scenario where each user has multiple attributes, for which we define Side-channel Resistant Local Information Privacy, and we give efficient methods to find protocols satisfying this criterion while still offering good utility. Finally, we introduce Conditional Reporting, an explicit LIP protocol that can be used when the optimal protocol is infeasible to compute, and we test this protocol on real-world and synthetic data. Experiments on real-world and synthetic data confirm the validity of these methods.

CRFeb 4, 2020

The Privacy Funnel from the viewpoint of Local Differential Privacy

Milan Lopuhaä-Zwakenberg

We consider a database $\vec{X} = (X_1,\cdots,X_n)$ containing the data of $n$ users. The data aggregator wants to publicise the database, but wishes to sanitise the dataset to hide sensitive data $S_i$ correlated to $X_i$. This setting is considered in the Privacy Funnel, which uses mutual information as a leakage metric. The downsides to this approach are that mutual information does not give worst-case guarantees, and that finding optimal sanitisation protocols can be computationally prohibitive. We tackle these problems by using differential privacy metrics, and by considering local protocols which operate on one entry at a time. We show that under both the Local Differential Privacy and Local Information Privacy leakage metrics, one can efficiently obtain optimal protocols; however, Local Information Privacy is both more closely aligned to the privacy requirements of the Privacy Funnel scenario, and more efficiently computable. We also consider the scenario where each user has multiple attributes (i.e. $X_i = (X^1_i,\cdots,X^m_i)$), for which we define \emph{Side-channel Resistant Local Information Privacy}, and we give efficient methods to find protocols satisfying this criterion while still offering good utility. Exploratory experiments confirm the validity of these methods.

CRDec 2, 2019

Estimating Numerical Distributions under Local Differential Privacy

Zitao Li, Tianhao Wang, Milan Lopuhaä-Zwakenberg et al.

When collecting information, local differential privacy (LDP) relieves the concern of privacy leakage from users' perspective, as user's private information is randomized before sent to the aggregator. We study the problem of recovering the distribution over a numerical domain while satisfying LDP. While one can discretize a numerical domain and then apply the protocols developed for categorical domains, we show that taking advantage of the numerical nature of the domain results in better trade-off of privacy and utility. We introduce a new reporting mechanism, called the square wave SW mechanism, which exploits the numerical nature in reporting. We also develop an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions. Extensive experiments demonstrate that our proposed approach, SW with EMS, consistently outperforms other methods in a variety of utility metrics.

CRNov 24, 2019

Improving Frequency Estimation under Local Differential Privacy

Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Škorić et al.

Local Differential Privacy protocols are stochastic protocols used in data aggregation when individual users do not trust the data aggregator with their private data. In such protocols there is a fundamental tradeoff between user privacy and aggregator utility. In the setting of frequency estimation, established bounds on this tradeoff are either nonquantitative, or far from what is known to be attainable. In this paper, we use information-theoretical methods to significantly improve established bounds. We also show that the new bounds are attainable for binary inputs. Furthermore, our methods lead to improved frequency estimators, which we experimentally show to outperform state-of-the-art methods.

CROct 17, 2019

Information-theoretic metrics for Local Differential Privacy protocols

Milan Lopuhaä-Zwakenberg, Boris Škorić, Ninghui Li

Local Differential Privacy (LDP) protocols allow an aggregator to obtain population statistics about sensitive data of a userbase, while protecting the privacy of the individual users. To understand the tradeoff between aggregator utility and user privacy, we introduce new information-theoretic metrics for utility and privacy. Contrary to other LDP metrics, these metrics highlight the fact that the users and the aggregator are interested in fundamentally different domains of information. We show how our metrics relate to $\varepsilon$-LDP, the \emph{de facto} standard privacy metric, giving an information-theoretic interpretation to the latter. Furthermore, we use our metrics to quantitatively study the privacy-utility tradeoff for a number of popular protocols.

CRMay 20, 2019

Locally Differentially Private Frequency Estimation with Consistency

Tianhao Wang, Milan Lopuhaä-Zwakenberg, Zitao Li et al.

Local Differential Privacy (LDP) protects user privacy from the data collector. LDP protocols have been increasingly deployed in the industry. A basic building block is frequency oracle (FO) protocols, which estimate frequencies of values. While several FO protocols have been proposed, the design goal does not lead to optimal results for answering many queries. In this paper, we show that adding post-processing steps to FO protocols by exploiting the knowledge that all individual frequencies should be non-negative and they sum up to one can lead to significantly better accuracy for a wide range of tasks, including frequencies of individual values, frequencies of the most frequent values, and frequencies of subsets of values. We consider 10 different methods that exploit this knowledge differently. We establish theoretical relationships between some of them and conducted extensive experimental evaluations to understand which methods should be used for different query tasks.