LGOct 17, 2022
Industry-Scale Orchestrated Federated Learning for Drug DiscoveryMartijn Oldenhof, Gergely Ács, Balázs Pejó et al.
To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n°831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.
CRJun 30, 2025
Detect \& Score: Privacy-Preserving Misbehaviour Detection and Contribution Evaluation in Federated LearningMarvin Xhemrishi, Alexandre Graell i Amat, Balázs Pejó
Federated learning with secure aggregation enables private and collaborative learning from decentralised data without leaking sensitive client information. However, secure aggregation also complicates the detection of malicious client behaviour and the evaluation of individual client contributions to the learning. To address these challenges, QI (Pejo et al.) and FedGT (Xhemrishi et al.) were proposed for contribution evaluation (CE) and misbehaviour detection (MD), respectively. QI, however, lacks adequate MD accuracy due to its reliance on the random selection of clients in each training round, while FedGT lacks the CE ability. In this work, we combine the strengths of QI and FedGT to achieve both robust MD and accurate CE. Our experiments demonstrate superior performance compared to using either method independently.
CRSep 14, 2021
The Effect of False Positives: Why Fuzzy Message Detection Leads to Fuzzy Privacy Guarantees?István András Seres, Balázs Pejó, Péter Burcsi
Fuzzy Message Detection (FMD) is a recent cryptographic primitive invented by Beck et al. (CCS'21) where an untrusted server performs coarse message filtering for its clients in a recipient-anonymous way. In FMD - besides the true positive messages - the clients download from the server their cover messages determined by their false-positive detection rates. What is more, within FMD, the server cannot distinguish between genuine and cover traffic. In this paper, we formally analyze the privacy guarantees of FMD from three different angles. First, we analyze three privacy provisions offered by FMD: recipient unlinkability, relationship anonymity, and temporal detection ambiguity. Second, we perform a differential privacy analysis and coin a relaxed definition to capture the privacy guarantees FMD yields. Finally, we simulate FMD on real-world communication data. Our theoretical and empirical results assist FMD users in adequately selecting their false-positive detection rates for various applications with given privacy requirements.
CRFeb 16, 2021
Revenue Attribution on iOS 14 using Conversion Values in F2P GamesFrederick Ayala-Gomez, Ismo Horppu, Erlin Gulbenkoglu et al.
Mobile app developers use paid advertising campaigns to acquire new users. Marketing managers decide where to spend and how much to spend based on the campaigns' performance. Apple's new privacy mechanisms have a profound impact on how performance marketing is measured. Starting iOS 14.5, all apps must get system permission for tracking explicitly via the new App Tracking Transparency Framework, which shows the users a pop-up asking if they give the app permission to track. If a user does not allow tracking, the required identifier to deterministically find the online advertising campaign that brought the user to install the app is not shared. Instead of relying on individual identifiers, Apple proposed a new performance mechanism called conversion value, which is an integer set by the apps for each user, and the developers can get the number of installs per conversion value for each campaign. However, interpreting how conversion values are used to measure the campaigns performance is not obvious because it requires a method to translate the conversion values to revenue. This paper investigates the task of attributing revenue to advertising campaigns using the reported conversion values per campaign. Our contributions are to formalize the problem, find the theoretically optimal revenue attribution function for any conversion value schema, and show empirical results on past data of a free-to-play mobile game using different conversion value schemas.
LGJul 13, 2020
Quality Inference in Federated Learning with Secure AggregationBalázs Pejó, Gergely Biczók
Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.
CRJun 4, 2019
SoK: Differential PrivaciesDamien Desfontaines, Balázs Pejó
Shortly after it was first introduced in 2006, differential privacy became the flagship data privacy definition. Since then, numerous variants and extensions were proposed to adapt it to different scenarios and attacker models. In this work, we propose a systematic taxonomy of these variants and extensions. We list all data privacy definitions based on differential privacy, and partition them into seven categories, depending on which aspect of the original definition is modified. These categories act like dimensions: variants from the same category cannot be combined, but variants from different categories can be combined to form new definitions. We also establish a partial ordering of relative strength between these notions by summarizing existing results. Furthermore, we list which of these definitions satisfy some desirable properties, like composition, post-processing, and convexity by either providing a novel proof or collecting existing ones.