Stefano Braghin

CR
13papers
336citations
Novelty47%
AI Score43

13 Papers

LGDec 16, 2022
Robust Learning Protocol for Federated Tumor Segmentation Challenge

Ambrish Rawat, Giulio Zizzo, Swanand Kadhe et al.

In this work, we devise robust and efficient learning protocols for orchestrating a Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2022). Enabling FL for FeTS setup is challenging mainly due to data heterogeneity among collaborators and communication cost of training. To tackle these challenges, we propose Robust Learning Protocol (RoLePRO) which is a combination of server-side adaptive optimisation (e.g., server-side Adam) and judicious parameter (weights) aggregation schemes (e.g., adaptive weighted aggregation). RoLePRO takes a two-phase approach, where the first phase consists of vanilla Federated Averaging, while the second phase consists of a judicious aggregation scheme that uses a sophisticated reweighting, all in the presence of an adaptive optimisation algorithm at the server. We draw insights from extensive experimentation to tune learning rates for the two phases.

35.5LGMay 22
PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

Anisa Halimi, Liubov Nedoshivina, Kieran Fraser et al.

The growing availability of clinical data has increased the use of machine learning, yet centralized data aggregation is often infeasible for sensitive health information. Federated Learning (FL) offers a distributed alternative, but its adoption is limited by substantial heterogeneity across institutional datasets, making harmonization a critical but frequently overlooked prerequisite for multi-site analytics. We introduce PrivFusion, a privacy-preserving multi-agent framework that automates the harmonization of structured datasets prior to federated training. PrivFusion uses agents to analyze local data, cluster semantically similar features across sites, and provide iterative transformation recommendations until alignment is achieved. Evaluation across four heterogeneous COVID-19 datasets demonstrates that PrivFusion effectively and efficiently harmonizes multi-site data while substantially reducing manual effort.

CRJul 4, 2019Code
Diffprivlib: The IBM Differential Privacy Library

Naoise Holohan, Stefano Braghin, Pól Mac Aonghusa et al.

Since its conception in 2006, differential privacy has emerged as the de-facto standard in data privacy, owing to its robust mathematical guarantees, generalised applicability and rich body of literature. Over the years, researchers have studied differential privacy and its applicability to an ever-widening field of topics. Mechanisms have been created to optimise the process of achieving differential privacy, for various data types and scenarios. Until this work however, all previous work on differential privacy has been conducted on a ad-hoc basis, without a single, unifying codebase to implement results. In this work, we present the IBM Differential Privacy Library, a general purpose, open source library for investigating, experimenting and developing differential privacy applications in the Python programming language. The library includes a host of mechanisms, the building blocks of differential privacy, alongside a number of applications to machine learning and other data analytics tasks. Simplicity and accessibility has been prioritised in developing the library, making it suitable to a wide audience of users, from those using the library for their first investigations in data privacy, to the privacy experts looking to contribute their own models and mechanisms for others to use.

CRAug 31, 2021
DLPFS: The Data Leakage Prevention FileSystem

Stefano Braghin, Marco Simioni, Mathieu Sinn

Shared folders are still a common practice for granting third parties access to data files, regardless of the advances in data sharing technologies. Services like Google Drive, Dropbox, Box, and others, provide infrastructures and interfaces to manage file sharing. The human factor is the weakest link and data leaks caused by human error are regrettable common news. This takes place as both mishandled data, for example stored to the wrong directory, or via misconfigured or failing applications dumping data incorrectly. We present Data Leakage Prevention FileSystem (DLPFS), a first attempt to systematically protect against data leakage caused by misconfigured application or human error. This filesystem interface provides a privacy protection layer on top of the POSIX filesystem interface, allowing for seamless integration with existing infrastructures and applications, simply augmenting existing security controls. At the same time, DLPFS allows data administrators to protect files shared within an organisation by preventing unauthorised parties to access potentially sensitive content. DLPFS achieves this by transparently integrating with existing access control mechanisms. We empirically evaluate the impact of DLPFS on system's performances to demonstrate the feasibility of the proposed solution.

CRAug 10, 2021
Secure k-Anonymization over Encrypted Databases

Manish Kesarwani, Akshar Kaul, Stefano Braghin et al.

Data protection algorithms are becoming increasingly important to support modern business needs for facilitating data sharing and data monetization. Anonymization is an important step before data sharing. Several organizations leverage on third parties for storing and managing data. However, third parties are often not trusted to store plaintext personal and sensitive data; data encryption is widely adopted to protect against intentional and unintentional attempts to read personal/sensitive data. Traditional encryption schemes do not support operations over the ciphertexts and thus anonymizing encrypted datasets is not feasible with current approaches. This paper explores the feasibility and depth of implementing a privacy-preserving data publishing workflow over encrypted datasets leveraging on homomorphic encryption. We demonstrate how we can achieve uniqueness discovery, data masking, differential privacy and k-anonymity over encrypted data requiring zero knowledge about the original values. We prove that the security protocols followed by our approach provide strong guarantees against inference attacks. Finally, we experimentally demonstrate the performance of our data publishing workflow components.

CRJul 21, 2021
Secure Random Sampling in Differential Privacy

Naoise Holohan, Stefano Braghin

Differential privacy is among the most prominent techniques for preserving privacy of sensitive data, oweing to its robust mathematical guarantees and general applicability to a vast array of computations on data, including statistical analysis and machine learning. Previous work demonstrated that concrete implementations of differential privacy mechanisms are vulnerable to statistical attacks. This vulnerability is caused by the approximation of real values to floating point numbers. This paper presents a practical solution to the finite-precision floating point vulnerability, where the inverse transform sampling of the Laplace distribution can itself be inverted, thus enabling an attack where the original value can be retrieved with non-negligible advantage. The proposed solution has the advantages of being generalisable to any infinitely divisible probability distribution, and of simple implementation in modern architectures. Finally, the solution has been designed to make side channel attack infeasible, because of inherently exponential, in the size of the domain, brute force attacks.

CRJun 24, 2019
AnonTokens: tracing re-identification attacks through decoy records

Spiros Antonatos, Stefano Braghin, Naoise Holohan et al.

Privacy is of the utmost concern when it comes to releasing data to third parties. Data owners rely on anonymization approaches to safeguard the released datasets against re-identification attacks. However, even with strict anonymization in place, re-identification attacks are still a possibility and in many cases a reality. Prior art has focused on providing better anonymization algorithms with minimal loss of information and how to prevent data disclosure attacks. Our approach tries to tackle the issue of tracing re-identification attacks based on the concept of honeytokens, decoy or "bait" records with the goal to lure malicious users. While the concept of honeytokens has been widely used in the security domain, this is the first approach to apply the concept on the data privacy domain. Records with high re-identification risk, called AnonTokens, are inserted into anonymized datasets. This work demonstrates the feasibility, detectability and usability of AnonTokens and provides promising results for data owners who want to apply our approach to real use cases. We evaluated our concept with real large-scale population datasets. The results show that the introduction of decoy tokens is feasible without significant impact on the released dataset.

CRAug 30, 2018
The Bounded Laplace Mechanism in Differential Privacy

Naoise Holohan, Spiros Antonatos, Stefano Braghin et al.

The Laplace mechanism is the workhorse of differential privacy, applied to many instances where numerical data is processed. However, the Laplace mechanism can return semantically impossible values, such as negative counts, due to its infinite support. There are two popular solutions to this: (i) bounding/capping the output values and (ii) bounding the mechanism support. In this paper, we show that bounding the mechanism support, while using the parameters of the pure Laplace mechanism, does not typically preserve differential privacy. We also present a robust method to compute the optimal mechanism parameters to achieve differential privacy in such a setting.

CROct 4, 2017
($k$,$ε$)-Anonymity: $k$-Anonymity with $ε$-Differential Privacy

Naoise Holohan, Spiros Antonatos, Stefano Braghin et al.

The explosion in volume and variety of data offers enormous potential for research and commercial use. Increased availability of personal data is of particular interest in enabling highly customised services tuned to individual needs. Preserving the privacy of individuals against reidentification attacks in this fast-moving ecosystem poses significant challenges for a one-size fits all approach to anonymisation. In this paper we present ($k$,$ε$)-anonymisation, an approach that combines the $k$-anonymisation and $ε$-differential privacy models into a single coherent framework, providing privacy guarantees at least as strong as those offered by the individual models. Linking risks of less than 5\% are observed in experimental results, even with modest values of $k$ and $ε$. Our approach is shown to address well-known limitations of $k$-anonymity and $ε$-differential privacy and is validated in an extensive experimental campaign using openly available datasets.

IRDec 26, 2014
Predicting User Engagement in Twitter with Collaborative Ranking

Ernesto Diaz-Aviles, Hoang Thanh Lam, Fabio Pinelli et al.

Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the quality of a predicted rating or the ranking performance for top-n recommended items. However, restricting the recommender system evaluation to these two aspects is rather limiting and neglects other dimensions that could better characterize a well-perceived recommendation. In this paper, instead of optimizing rating or top-n recommendation, we focus on the task of predicting which items generate the highest user engagement. In particular, we use Twitter as our testbed and cast the problem as a Collaborative Ranking task where the rich features extracted from the metadata of the tweets help to complement the transaction information limited to user ids, item ids, ratings and timestamps. We learn a scoring function that directly optimizes the user engagement in terms of nDCG@10 on the predicted ranking. Experiments conducted on an extended version of the MovieTweetings dataset, released as part of the RecSys Challenge 2014, show the effectiveness of our approach.

CRMar 11, 2014
Answering queries using pairings

Alberto Trombetta, Giuseppe Persiano, Stefano Braghin

Outsourcing data in the cloud has become nowadays very common. Since -- generally speaking -- cloud data storage and management providers cannot be fully trusted, mechanisms providing the confidentiality of the stored data are necessary. A possible solution is to encrypt all the data, but -- of course -- this poses serious problems about the effective usefulness of the stored data. In this work, we propose to apply a well-known attribute-based cryptographic scheme to cope with the problem of querying encrypted data. We have implemented the proposed scheme with a real-world, off-the-shelf RDBMS and we provide several experimental results showing the feasibility of our approach.

CRJul 10, 2013
Secure and Policy-Private Resource Sharing in an Online Social Network

Stefano Braghin, Vincenzo Iovino, Giuseppe Persiano et al.

Providing functionalities that allow online social network users to manage in a secure and private way the publication of their information and/or resources is a relevant and far from trivial topic that has been under scrutiny from various research communities. In this work, we provide a framework that allows users to define highly expressive access policies to their resources in a way that the enforcement does not require the intervention of a (trusted or not) third party. This is made possible by the deployment of a newly defined cryptographic primitives that provides - among other things - efficient access revocation and access policy privacy. Finally, we provide an implementation of our framework as a Facebook application, proving the feasibility of our approach.

SIMar 4, 2013
The Zen of Multidisciplinary Team Recommendation

Anwitaman Datta, Stefano Braghin, Jackson Tan Teck Yong

In order to accomplish complex tasks, it is often necessary to compose a team consisting of experts with diverse competencies. However, for proper functioning, it is also preferable that a team be socially cohesive. A team recommendation system, which facilitates the search for potential team members can be of great help both for (i) individuals who need to seek out collaborators and (ii) managers who need to build a team for some specific tasks. A decision support system which readily helps summarize such metrics, and possibly rank the teams in a personalized manner according to the end users' preferences, can be a great tool to navigate what would otherwise be an information avalanche. In this work we present a general framework of how to compose such subsystems together to build a composite team recommendation system, and instantiate it for a case study of academic teams.